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ABSTRACT 


The present thesis entitled “Speech Analysis and Synthesis of the Punjabi 
Language” embodies the investigations carried out by me at the School of 
Mathematics and Computer Applications (SMCA), under the supervision of Dr. 
Satvinder Singh Bhatia (Professor, SMCA, Thapar University, Patiala). 


The quality of the synthesized speech, also known as (aka) synthetic speech, has 
been an important issue for speech researchers. This project is an attempt to address this 
important issue of synthesizing high quality speech in the Punjabi language. The Punjabi 
language is a modern Indo-Aryan language. The Punjabi language has been ranked 
amongst the top 15 spoken languages of the world. Over the years, this ranking has 
varied between 10 and 18. With more than 90 million native speakers, and more than 140 
million speakers in 150 countries of the world, the Punjabi is considered amongst the 
world’s 10 most influential languages, and is respectfully included in an article by the 
same name, “The World’s 10 Most Influential Languages” by George Weber. Andronov 
reports that the Institute of Oriental Studies of the Russian Academy of Sciences has 
prepared and preserved dozens of reports on the Punjabi language. Any research on the 
Punjabi language, therefore, assumes an international significance. 


There are many dialects of the Punjabi language mainly written in the two scripts. 
Punjabi University (Patiala, India) published a list of 31 dialects of the Punjabi language. 
In the Punjab state of India, the Punjabi Language is written in the Gurmukhi script, 
whereas in the Punjab state of Pakistan, the Punjabi language is written in the Persian (or 
Shahmukhi) script. It will be an enormous task to deal with all dialects of both scripts of 
the Punjabi language. This research work, therefore, concentrates only on one dialect (the 
Malwai dialect) and one script (the Gurmukhi script) of the Punjabi language. 


This research work has been carried out with the following three objectives in 
view as Set in the Institute Research Board (IRB): 


(i) To design a new phonetic alphabet for speech processing in the Punjabi language 
consistent with the ARPAbet because at present, the symbols consistent with the 
ARPAbet phonetic transcription do not exist in the Punjabi Language. 


(ii) To develop a new text and speech corpus for the Punjabi language because no 
computer database/corpus exists in the Punjabi language, where the representative 
speech sentences concentrate on a specific dialect (the Malwai dialect) of the 


Punjabi language. 


(iii) To conduct the linear prediction analysis and synthesis (aka Linear Predictive 
Coding or LPC) of the Punjabi speech sentences because no work has so far been 


Vil 
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reported in the literature where the Punjabi speech has been analyzed/synthesized 
using the linear prediction model of speech production. 


The first issue is the actual representation of the phonemes. In linguistics, the 
International Phonetic Alphabet (IPA) designed by the International Phonetic Association 
is used to represent the phonemes. However, the major limitation of this representation is 
that it needs some special symbols that are not readily available on computer keyboards. 
In this work, a new phonetic alphabet consistent with the ARPAbet phonetic transcription 
of the Punjabi language has been developed in CHAPTER IL. This newly designed 
scheme called PUNJARPAbet has been graphically presented in CHAPTER VI. This 
achieves our first objective set for this study. 


The second issue is to develop a new corpus suitable for achieving the third 
objective. A new text and speech corpus with several original features has been 
developed in CHAPTER IV. The contents of this chapter meet the second objective set 
for this research. The new corpus has at least twenty original features such as complete 
sentences (rather than words only), some lines of the folk songs in the original form and 
the modified form, some single line folk songs (bolis), some newly-written bolis, some 
slowly fading words of the Punjabi language, and some theth pendu shabads (rustic 
words). In the corpus developed in this work, the speech sentences have been recorded in 
the Malwai dialect of the Punjabi language, and the sentences have been written in the 
Gurmukhi script. The fact that the new corpus is very rich and versatile due to a wide 
variety of its linguistic and cultural features makes it an ideal corpus for any serious 
speech processing work in the Punjabi language. 


The third issue is to analyze and synthesize the Punjabi speech. No work has so 
far been reported in the literature where the Punjabi speech has _ been 
analyzed/synthesized using the Linear Predictive Coding (LPC). This work investigated 
the linear prediction analysis and synthesis of the Punjabi speech for the first time. The 
linear prediction technique is rated very high in the broad field of speech processing. This 
technique has been chosen because of its popularity, sound mathematical properties, 
simplicity, and its effectiveness as evidenced by the SPEECH UNDERSTANDING 
RESEARCH (SUR) project of the Advanced Research Projects Agency (ARPA) started 
in early 1970s. The third objective set for this project has been accomplished in 
CHAPTER V and graphically presented in CHAPTER VI. 


The present thesis consists of seven chapters, and five appendices. The work 
carried out in this thesis can be described as follows: 


Chapter I is introductory and it presents the basic concepts required in this thesis. 
Chapter II introduces the main concepts required to understand the broad field of 
Linguistics so that one can understand the corpus designed in Chapter IV. A new 
phonetic alphabet (called PUNJARPAbet) consistent with ARPAbet is designed in this 
chapter. This chapter achieves objective 1 set for this research. Chapter III prepares the 
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reader for the linear prediction speech analysis and synthesis of the Punjabi language to 
be conducted in Chapter V. In Chapter IV, a new text and speech corpus has been 
designed. It covers objective 2 set for this study. Chapter V describes in detail the 
mathematical concepts involved in the Linear Prediction analysis and synthesis of the 
Punjabi speech sentences. It accomplishes objective 3 set for this project. In Chapter VI, 
the spectrographic analysis is conducted by using Praat and MATLAB where the graphs 
for formant frequencies, intensity, pitch, V/UV contours, and the Fast Fourier Transform 
(FFT) of the speech waveforms have been plotted to gain additional insight into the 
results obtained in Chapter V. The graphical analysis evaluating the newly-designed 
coding scheme PUNJARPABet is also included in Chapter VI. Chapter VII summarizes 
the whole work, highlighting the original contribution of this project. This chapter 
concludes by stating the future directions. 


The thesis also includes five Appendices. Appendix A and Appendix B include 
the Text Corpus developed in this project. Appendix A includes about 200 sentences and 
single line folk songs (bolis) while Appendix B includes more than 100 bolis. In both 
appendices, the complete corpus has been transcribed both in the IPA and the newly 
developed phonetic alphabet PUNJARPAbet. Appendix C describes the Levinson- 
Durbin algorithm to efficiently solve the Normal Equations resulting from the 
Autocorrelation Method for the linear prediction analysis. Appendix D includes the SIFT 
algorithm for computing the V/UV decision and the pitch extraction. Appendix E lists 
twenty-five Punjabi speech sentences synthesized in the present study. 


The thesis concludes by listing the Bibliography of various publications and 
websites cited in this work. 
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CHAPTER I 
INTRODUCTION 


The purpose of this chapter is to introduce the basic concepts involved in this 
work. These basic concepts include: Official Languages of India, Punjabi language, 
Dialects of the Punjabi Language, Importance of Dialects, Malwai Dialect, Digital 
Signal/Speech Processing, Biometrics and Speech Processing, Linear Prediction, Need of 
a New Corpus, Computerization of the Punjabi Language, Objectives of the Study, and 
the Thesis Organization. 


1.1. OFFICIAL LANGUAGES OF INDIA 


The 8" Schedule to the Constitution of India, as of May 2008 lists 22 languages 
as the official languages of India. Out of the thousands of languages and dialects of 
India, 29 languages have more than a million native speakers, 60 languages have more 
than 100,000 speakers and 122 languages have more than 10,000 native speakers as per 
the most recently available census of India (2001). The 22 official languages with the 
most recent number of the native speakers in millions, written in brackets, are listed 
below [141]: 


1. Hindi (258-422) 12. Maithili (12-32) 

2. Bengali (83) 13. Assamese/Axomiya (13) 
3. Telugu (74) 14. Santhali (6.5) 

4. Marathi (72) 15. Kashmiri (5.5) 

5. Tamil (61) 16. Konkani (2.5 -7.6) 

6. Urdu (52) 17. Nepali (2.9) 

7. Gujarati (46) 18. Sindhi (2.5) 

8. Kannada (38) 19. Dogri (2.3) 

9. Malayalam (33) 20. Manipuri or Meitei or Meithei (1.5) 
10. Oriya (33) 21. Bodo (1.4) 

11. Punjabi (29) 22. Sanskrit (0.01) 


Most of these languages (15 to be exact) belong to a family of languages known 
as the Indo-Aryan languages (a sub-branch of the Indo-European family, spoken by 74% 
of Indians), the Punjabi language being one of these 15 languages. The following four 
languages, spoken by 23% of Indians, belong to another family of languages known as 
the Dravidian languages: Tamil, Telugu, Kannada, and Malayalam. The Tibeto-Burman 
family includes these two languages namely: Manipuri (or Meitei or Meithei), and Bodo. 
One language (Santhali) belongs to the Munda family, and thus totaling to 
154+4+24+1=22. 
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1.2. PUNJABI LANGUAGE 


The Punjabi language is a modern Indo-Aryan language. The Punjabi language 
has been ranked amongst the top 15 spoken languages of the world. Over the years, this 
ranking has varied between 10 and 18. With more than 90 million native speakers, and 
more than 140 million speakers in 150 countries of the world, the Punjabi is considered 
amongst the world’s 10 most influential languages, and is respectfully included in an 
article by the same name, “The World’s 10 Most Influential Languages” by George 
Weber [130]. The most recent confirmation is an inset published by the daily newspaper 
Hindustan Times, New Delhi, India (one day after the International Mother Tongue day 
2013 on Friday; February 22, 2013; pp. 16) under the title LANGUAGES WORLD 
SPEAK MOST where Punjabi has been ranked number 10 as follows: Mandarin (14.1%), 
Spanish (5.85%), English (5.25%), Hindi (4.46%), Arabic (4.24%), Portuguese (3.08%), 
Bengali (3.05%), Russian (2.42%), Japanese (1.92%), and Punjabi (1.44%). Andronov 
[6] reports that the Institute of Oriental Studies of the Russian Academy of Sciences has 
prepared and preserved more than three dozen reports on the Punjabi language. Any 
research on the Punjabi language, therefore, assumes an international significance. 


1.3. DIALECTS OF THE PUNJABI LANGUAGE 


The Punjabi was the native language of the Punjab state of undivided India. In 
1947, when the British rulers partitioned India into two countries (India and Pakistan), 
the Punjab state was also bifurcated into two states: East Punjab (in India), and West 
Punjab (in Pakistan). There are many dialects of the Punjabi language in both countries. 
Punjabi University (Patiala, India) published a list of 31 dialects of the Punjabi language 
[142] as follows: Awankari, Baar di Boli, Banwali, Bhattiani, Bherochi, Chacchi, 
Chakwali, Chambiali, Chenavri, Dhani, Doabi, Dogri, Ghebi, Gojri, Hindko, Jatki, 
Jhangochi/Jhangi, Kangri, Kachi, Lubanki, Malwai, Majhi, Pahari, Pothohari/Pindiwali, 
Powadhi/Puadhi, Punchi, Peshori/Peshawari, Rathi, Swaen, Thalochri, Wajeerawadi. 


1.4. IMPORTANCE OF DIALECTS 

The importance of studying and computerizing various dialects of a language 
should not be underestimated. Many researchers from all-over the world have 
emphasized the importance of dialects. Three representative examples showing their 


importance are given as under: 


(A) In the famous text Linguistic Atlas of the Punjab [43, pp. v], André Martinet’s 
comments about dialect are noteworthy: 
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1. “A dialect is not the language plus a few appendages; it is a whole that 
deserves to be considered and studied as such.” 

De “ ‘Dialect’ is one of the most ambiguous-terms in the field of linguistics. 
Etymologically and originally, it is used in reference to varieties of the 
same language. In old Greece, Athenians, Boeotians and Spartans were all 
supposed to speak Greek, but with a number of serious differences which 
however did not seem to prevent mutual understanding.” 

Be “There is no form of the language that is not a dialect.” 


(B) An article “A simple introduction to the importance of studying dialects and 
language” published by Rhodes University [143] describes the importance of language, 
dialect and identity as follows: “Language says a lot about our identity. Americans, 
Australians, New Zealanders and South Africans all speak differently. When we meet 
somebody from a different part of the country, they may use different words, sounds or 
grammatical structures. A dialect is a variety of language that is characteristic of a certain 
area. For instance, in the Northern Cape, people refer to older people as grootmense and 
paper as pampier whereas in Pretoria they are called oumense and papier. If you hear 
coloured people from Cape Town speaking Afrikaans, they sound different to Afrikaans 
spoken elsewhere. People from Natal speak English in different ways to people from 
Johannesburg etc. So often, the way we speak says a lot about where we are from, who 
we are and what we care about. So studying dialects is one way of validating people's 
identities and ways of life.” 


The same article [143] establishes a strong connection between dialects and 
computers as follows: “Over the last ten years or so, computational linguists have 
managed to get computers to understand human language -- to a limited degree. But 
computers are pretty dumb -- they can learn to understand one person but when they hear 
another variety of language, they tend to get confused. One way of solving this would be 
to let computers listen to many different varieties and dialects of a language. If the 
computer listens long enough, it can learn to recognize many different people speaking 
many different languages. Of course, this can only happen if we study dialects and record 
people who speak them.” 


(C) The importance of language varieties and regional dialects can be understood 
better when we come across a host of varieties of the English language. The following 
varieties have been frequently mentioned in literature: American English, British English, 
Australian English, Canadian English, Indian English, Jamaican English, New Zealander 
English, South African English, Mexican English, French English, Chinese English, 
Russian English, German English, and Italian English. 


Dialects affect many characteristics of the human beings including the 
pronunciation/accent, spellings, vocabulary, and grammar of the speakers and authors. 
The differences of the pronunciation/accent amongst various people are obvious in the 
real world scene, where the pronunciation/accent of any two speakers of the same 
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language is rarely the same. Some examples of some of the other characteristics 
(spellings, vocabulary, and grammar) are given here. As far as spellings are concerned, 
various authors spell the same words differently. Some examples are: Programme / 
Program, Honor / Honour, Color / Colour. Various speakers use different vocabulary to 
describe the same items. Some examples are: Truck / Lorry, Elevator / Lift, Soccer / 
Football. 


Different people use different grammar (or grammatical structure) to describe the 
same situation. Some examples are: 


I. Call him / Give him a call / Phone him / Ring him / Ring him up. 
2. You don’t need to fix that car / You needn’t fix that car. 
3. I have three children / I’ve got three children. 


Similar examples can be found in various dialects of other languages. These examples 
convincingly prove that it is important to study various dialects of a language. 


1.5. MALWAI DIALECT 


The main dialects of the Punjabi language [142] in India are: Malwai, Majhi, 
Doabi, and Puadhi, and the main dialects in Pakistan are: Multani, Pothohari, and 
Lehndi. In the Punjab state of India, the Punjabi language is written in the Gurmukhi 
script, whereas in the Punjab state of Pakistan, the Punjabi language is written in the 
Persian (or Shahmukhi) script. It will be an enormous task to design a corpus that can 
completely describe all dialects in both scripts of the Punjabi language (Simply stated, a 
corpus 1s a large body of text in the natural state as recorded speech or written text). This 
work, therefore, concentrates only on one dialect of the Punjabi language: the Malwai 
dialect. The Malwai dialect has been chosen because of the following major reasons: 


1. At present, there are twenty-two districts in the Punjab state [144]. According 
to Wikipedia, Malwa makes up the majority of the Punjab region [145], 
consisting of 11 districts (Barnala, Bathinda, Faridkot, Fazilka, Ferozepur, 
Ludhiana, Mansa, Moga, Muktsar, Patiala, Sangrur) and parts of the jo" 
district named Fatehgarh Sahib. 

2. Thapar University (formerly Thapar Institute of Engineering & Technology), 
where this project is being completed is located in the city of Patiala. The city 
of Patiala belongs to the Malwa region. 

3. The Language Department Punjab and Punjabi University are both also 
located in the city of Patiala. (Punjabi University Patiala is world’s second 
university to be named after a language name, the first one being the Hebrew 
University of Israel [146]). 

4. The Sahitya Academy Award is the highest literary award of India bestowed 
annually on an author for his outstanding contribution to literature, one for 
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each Modern Indian Language. So far, seventeen out of the fifty-two winners 
of this award in the Punjabi literature [147] belong to the Malwa region 
(clearly demonstrating that Malwai authors are not lagging behind in this 
area). 

5. The author was born and raised in a small village in the Malwa region of the 
Punjab state, and therefore, is more familiar with the Malwai dialect than any 
other dialect. 


The Malwai dialect in the Punjab state of India is written in the Gurmukhi script [140]. 
Therefore, in the corpus designed in this thesis (Chapter IV), the speech sentences have 
been recorded in the Malwai dialect of the Punjabi language, and the sentences have been 
written in the Gurmukhi script. 


1.6. DIGITAL SIGNAL/SPEECH PROCESSING 


Digital signal processing [11-12, 15, 20, 24-25, 79, 83, 85-90, 99-102] has been 
successfully applied to many types of signals including telecommunications signals, 
audio signals, image processing, radar signals, sonar signals, signals in geophysics, and 
speech signals. Digital speech processing includes many fields. Some of these fields of 
on-going research interest include: 

e Speech analysis, e.g., linear prediction (LP) analysis [11-12, 52, 74-76] 

speech synthesis, e.g., LP synthesis, text-to-speech (TTS) synthesis [126] 

speech enhancement or noise cancellation [87-89] 

speech coding or speech storage and transmission [23, 47, 65, 77, 82] 

speaker separation [79, 83, 99] 

e speaker identification [79, 83, 99] 

e language identification [59, 92-94] 

e automatic speech recognition (ASR) which also includes continuous 
speech recognition (CSR), discrete utterance recognition, and keyword 
spotting [79, 83, 99] 

e pitch and formant estimation [90, 99] 

e aids-to-the handicapped [79, 83, 99] 


This list can be enhanced to include many more fields. This thesis concentrates on 
the topic Speech Analysis and Synthesis of the Punjabi Language. The quality of the 
synthesized speech (or synthetic speech) has been an important issue for speech 
researchers. The following comments by Bruce Sherwood (a multi-dimensional 
researcher knowledgeable in Engineering Science, Physics, Softwave design, Linguistics, 
and several languages including Spanish, Italian, French, Russian, Persian, and 
Esperanto) in his article The Computer Speaks [101, pp. 18] are true even today: 
“Synthetic speech of good quality is hard to achieve because of dynamic properties of the 
human-speech mechanism that have to be simulated. Research, which is on-going, has 
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thus far yielded mixed results. The basic principles of what components are needed to 
generate human speech are known (IEEE Spectrum, October 1970, pp. 22-45), and the 
quality of synthetic speech has been increasing in laboratories. But synthetic speech is 
still unacceptable to critical human ears. Listeners are not comfortable hearing plastic 
talk, and so methods have yet to be devised for a good human-speech synthesizer.” 


This project is an attempt to address this important issue of producing high quality 
synthesized speech or synthetic speech in the Punjabi language. 


1.7. BIOMETRICS AND SPEECH PROCESSING 


Some forms of the human identification, such as one’s physical traits are very 
difficult to be faked or imitated. Biometrics is the science of measuring individual body 
characteristics. Biometrics uses some of those physical characteristics or traits of a person 
in security devices, which are ‘not easily fakable’ by any other person. Voice/speech is 
amongst the most important physical traits and biological characteristics used in the 
biometric security devices. Other such biological characteristics used in these devices 
include the lips, the handprints, the fingerprints, the blood vessels in the back of the 
eyeball, and even one’s entire face [50, pp. 14.25]. Biometric systems are used in lieu of 
typed passwords to identify the people authorized to use a computer system [50, pp. 
3.20]. According to Etter [35, pp. 220-221], the most common biometrics include 
fingerprints, face, iris, DNA, and speech, whereas other biometrics include bones, hand 
recognition, and handwriting. The fact that the speech analysis and synthesis is 
extensively used in the Biometric systems enhances the significance of the present thesis. 


1.8. LINEAR PREDICTION 


Linear prediction technique is also known as Linear Predictive Coding or LPC 
[Rabiner and Schafer, 89, pp. 473]. Almost every book on Speech Processing includes a 
chapter on Linear Prediction, and usually starts by attesting the simplicity, power, and 
the popularity of this technique. Markel and Gray Jr. have written the book Linear 
Prediction of Speech [76] that includes a number of comments about the merits of this 
technique including speed, popularity, simplicity, and stability. In addition to speech 
processing, linear prediction (LP) is a fundamental tool in many diverse areas such as 
adaptive filtering, system identification, economics, geophysics, and spectral estimation 
[15, pp. 121]. 


This work concentrates on the Linear Prediction Analysis and Synthesis of the 
Punjabi language because 

1. no such work has so far been reported in the literature, and 

2. we have been active in this area of research [26-31] since early 1980s. 
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In addition to these two reasons, two more extremely important reasons, namely 
popularity and Speech Understanding Research (SUR) Project), are detailed below. 


1.8.1. POPULARITY 


The linear prediction technique is rated very high in the broad field of speech 
processing as evidenced by many famous speech researchers. Some representative 
quotations are given below: 


“(one of) the most popular parametric representations in use today” [Allen et al., 4, pp. 
10]; 


“linear prediction plays a fundamental role in all aspects of speech.” [Benesty et al., 
15, pp. 121]; 


“The (linear prediction) technique has been the basis for so many practical and 
theoretical results that it is difficult to conceive of modern speech technology without 
it” [Deller et al., 24, pp. 266]; 


“the most powerful analysis method in every respect” [Lass, 67, pp. 484]; 


“an extremely powerful technique for the digital analysis of speech” [Oppenheim, 79, 
pp. 156]; 


“probably the most powerful tool in the speech-processor’s armamentarium” [Parsons, 
83, pp. 345]; 


“One of the most powerful speech analysis techniques is the linear predictive 
analysis...The importance of this method lies both in its ability to provide extremely 
accurate estimates of the speech parameters, and its relative speed of computation.” 
[Rabiner and Schafer, 88, pp. 396]; 


“Linear prediction is a powerful tool in speech processing. Linear prediction is widely 
used in speech applications (recognition, compression, modeling, etc.) [Markel and 
Gray, 76, pp. 271] 


In his recent article The History of Linear Prediction [11], Atal states that “the 
introduction of linear prediction techniques started a new era in speech processing about 


AO years ago. Since then, these techniques have found numerous applications.” 


Based on linear predictive coding, Texas Instruments introduced a toy named 
“Speak and Spell” in 1978. The toy enjoyed a high level of popularity for several years. 
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1.8.2. LINEAR PREDICTION AND SPEECH UNDERSTANDING RESEARCH 
(SUR) PROJECT 


The Advanced Research Projects Agency (ARPA) of USA started its famous 
Speech Understanding Research (SUR) Project in early 1970s with a 5 year time limit 
(1971-76) for achieving the following specific goals in the area of continuous speech 
recognition. A large number of speech research groups participated in the ARPA SUR 
project. Rabiner and Juang [15, pp. 527] conclude that eight technological contributions 
came out of this five-year project. Out of these eight, the first three prominently revolve 
around Linear Predictive Coding (LPC), or simply linear prediction, as follows: 


1. Use of LPC as the front-end spectral processor became the standard for speech 
recognition systems [Atal and Hanauer (12), Itakura and Saito (15, 52), 
Makhoul (74)]. 


2. Methods for reliably estimating vocal tract areas and vocal tract length from LP 
representations [Wakita, 15]. 


3. Use of the LPC residual as a sound similarity measure [Itakura (15, 52)] 


The above discussion proves that the linear prediction technique is hugely 
popular. Therefore, it makes sense to conduct the speech analysis and synthesis of the 
Punjabi language by using the classical linear prediction technique, and to design new 
speech processing techniques similar to LPC. 


1.9. NEED FOR A NEW CORPUS 


In very simple words, a corpus is a large body of text in the natural state as 
recorded speech or written text. Designing databases and corpora is generally considered 
the first major step in speech processing. The quality of the corpora and databases play a 
decisive role in determining the quality of the speech processing work tested and based 
on these corpora and databases. 


This work concentrates on the Linear Prediction Analysis and Synthesis of the 
Punjabi language. Since no such work has so far been investigated in the literature, it is 
evident that no corpus suitable to be used in conjunction with this technique exists in the 
Punjabi language. 


The corpus designed in this work will be one of the most suitable corpora to be 
used in the Digital Speech Processing techniques which make use of the complete 
sentences (rather than words only) for analyzing and synthesizing speech (e.g., the 
classical linear prediction analysis and synthesis technique of the Punjabi language in this 
thesis). 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


1.10. COMPUTERIZATION OF THE PUNJABI LANGUAGE 


The Computerization of the Punjabi Language is a relatively new field of 
research. This field of research started attracting the attention of the research scholars 
towards the end of the twentieth century. Several organizations and individuals have been 
actively pursuing the research in this field throughout the world. In 2004, Agrawal, 
Samudravijaya and Arora [3, pp. 25] compiled a preliminary list of the on-going 
activities, and the institutes, active in the text and speech corpora development in Indian 
Languages. We have updated this list specifically for the activities in the Punjabi 
Language. At present, the following institutions are active in the field of the 
computerization of the Punjabi Language: 


1. Thapar University (formerly TIET), Patiala, India. [2, 63, 120, 68-70] 

Punjabi University (Advanced Centre for Technical Development of 

Punjabi Language, Literature and Culture), Patiala, India. [2, 62-63, 68-70, 

120] 

Central Institute of Indian Languages (CIIL), Mysore, India. [2, 34] 

Northern Regional Language Centre (NRLC), Patiala, India. [2, 34] 

2 Centre for Development of Advanced Computing (CDAC), Ministry of 
Communications & IT (Govt. of India), Noida, India. [2, 120] 

6. CFSL, Chandigarh, India. [2] 

ve CSIO, Delhi, India. [2] 

8. Kamrah International Institute of Information Technology (KIIT), 
Gurgaon, India. [2-3, 7, 32] 

9, Thompson Rivers University, Kamloops, BC, Canada [26-30, 63]. 


_ 


In addition to the researchers associated with the above institutions having 
research funding, the following individuals are known to be very seriously active in the 
field of computerization of the Punjabi Language on voluntary basis without externally 
supplied research funding: 


Dr. Kulbir Singh Thind (USA). [62-63, 127] 

Janmeja Singh Johl (Ludhiana, India). [54-55, 62-63] 
Kirpal Singh Pannu (Canada). [62-63, 80-81] 

Baba Baljinder Singh (Rara Sahib, India). [62-63, 107-108] 
Dr. Baldev Kandola (UK). [62-63] 


Dg Muir 


Their activities include many diverse areas such as Punjabi font development, 
keyboard standardization, font conversion, transliteration between the Gurmukhi script 
and the Shahmukhi (Persian) script, word processor development including spell checker 
and thesaurus, search engine development, availability of the original text of Sri Guru 
Granth Sahib and other prominent Sikh scriptures along with their translations on the 
internet, optical character recognition, text-to-speech (TTS) synthesis, dictionary 
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development, machine translation, Unicode implementation, speech processing and many 
more. 


Lehal [62-63, 68-70, 115], and Singh [63, 70], both currently working with 
Punjabi University Patiala, have authored more papers on a wide variety of topics, than 
any other researcher mentioned above, in the fast growing field of the computerization of 
the Punjabi language. 


Computer-related books have also been published recently. Kamboj [60-61] 
started publishing his computer-related books (e.g., [61]) in June 2003, followed by 
Juneja [58] in 2004, and Pannu [81] in February 2009. 


The above-mentioned list is a preliminary list, and a very brief overview of the 
activities to the best of our present knowledge. There may be other institutions and 
individuals actively involved in this field. Considering the fact that the computerization 
of the Punjabi Language is still at its initial stage of problems, struggles, controversies, 
and challenges, it is our sincerest hope and belief that the other researchers and authors 
will continue to update our preliminary list in a fact-based scientific and unbiased 
professional manner. 


1.11. OBJECTIVES OF THE STUDY 

In specific terms, the objectives of this study are as under: 

(i) To design a new phonetic alphabet for speech processing in the Punjabi language 
consistent with the ARPAbet because at present, the symbols consistent with the 
ARPAbet phonetic transcription do not exist in the Punjabi Language. 

(ii) To develop a new text and speech corpus for the Punjabi language because no 
computer database/corpus exists in the Punjabi language, where the representative 
speech sentences concentrate on a specific dialect (the Malwai dialect) of the 
Punjabi language. 

(iii) To conduct the linear prediction analysis and synthesis (aka Linear Predictive 
Coding or LPC) of the Punjabi speech sentences because no work has so far been 
reported in the literature where the Punjabi speech has been analyzed/synthesized 
using the linear prediction model of speech production. 


1.12. THESIS ORGANIZATION 


The aim of the present thesis is to investigate speech analysis and synthesis of 
the Punjabi language using the linear prediction technique (aka Linear Predictive 
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Coding or LPC). The present thesis consists of seven chapters, and five appendices. Each 
chapter is divided into various sections, subsections, and sub-subsections on an as-needed 
basis. The numbers like 5.2.1.1 indicate sub-subsection | of subsection | of section 2 of 
Chapter V. The numbers in square brackets refer to the references cited in this work 
(Books, Journals, Dictionaries, Encyclopaedias, and Internet websites) from the 
Bibliography. The work carried out in this thesis can be described as follows: 


CHAPTER I is introductory. It presents the basic concepts required in this thesis 
to provide sufficient background for later chapters. In this chapter notations and 
terminology to be used in the sequel is presented. A resume of the hitherto known results 
interrelated with our results along with a brief plan of our results has also been given in 
this chapter. It introduces the following items: Official Languages of India, Punjabi 
Language, Dialects of the Punjabi Language, Importance of Dialects, Malwai Dialect, 
Digital Signal/Speech Processing, Biometrics and Speech Processing, Linear Prediction, 
Need for a New Corpus, Computerization of the Punjabi Language, Objectives of the 
Study and Thesis Organization. Some of the basic concepts covered in this chapter will 
be repeated occasionally in various chapters for the sake of completeness. 


CHAPTER II introduces the main concepts required to understand the broad field 
of Linguistics so that one can understand the corpus to be designed in CHAPTER IV. 
After the brief introduction, it discusses the Phonetics with reference to the English 
Language, The Punjabi Language and Gurmukhi Script, The Punjabi Speech Sounds, 
Classification of the Punjabi Sounds, and Phonetic Coding Schemes. The new phonetic 
alphabet (consistent with ARPAbet), called PUNJARPAbet is designed in this chapter 
This chapter achieves objective | set for this research. 


CHAPTER III deals with the Speech Signal, the Speech Sound and Speech 
Processing, and various Speech Production Models including the Linear Speech 
Production Model, the Digital Model of speech Production, and the Linear Prediction 
Model of speech Production (to be used in CHAPTER V), Speech Signal Processing: A 
Grand Challenge, followed by the conclusion. 


In CHAPTER IV, a new text and speech corpus has been designed. It describes 
the Corpus Design, Examples of the corpus, Overview of the New Corpus, Recording of 
the New Corpus, and Technical Considerations. This chapter concludes by describing the 
Original Features of the new Corpus. This chapter accomplishes objective 2 set for this 
study. 


CHAPTER V describes in detail the mathematical concepts involved in the LPC 
of the Punjabi speech. The topics addressed in this chapter include Linear Prediction 
Analysis of Speech, Linear Prediction Coefficients (Autocorrelation method and 
Covariance method), Gain factor, V/UV Decision and Pitch Extraction, Linear Prediction 
Synthesis of Speech; Pitch, Speech Synthesis, and Tones, Speech Processing 
Considerations, Analysis and Synthesis Considerations, Analysis and Synthesis 


11 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


(Combined), Computational Savings, and the Punjabi Sentences Synthesized. The speech 
synthesis results achieved in this chapter using software programs MATLAB and Praat 
are described in the Conclusion. This chapter fulfills objective 3 set for this project. 


CHAPTER VI includes the graphical analysis of the results obtained in the 
previous two chapters: (a) Spectrographic analysis of speech waveforms of Chapter V (b) 
Graphical evaluation of the new phonetic coding scheme PUNJARPAbet designed in 
CHAPTER II. The graphs for formant frequencies, intensity, pitch, and V/UV contours 
are presented in this chapter. Additional insight into the results obtained in CHAPTER V 
has been provided by the spectrographic analysis conducted using Praat and MATLAB 
where the Fast Fourier Transform (FFT) of the speech waveforms has been plotted. Some 
of the features of the corpus designed in CHAPTER IV and transcribed using 
PUNJARPAbet designed in CHAPTER II, have also been graphically evaluated in this 
chapter. 


CHAPTER VII summarizes the whole work, highlighting the original 
contribution of this project. This chapter concludes by stating the future directions. 


The thesis also includes five Appendices as follows: APPENDIX A and 
APPENDIX B include the Text Corpus developed in this project. APPENDIX A includes 
about 200 sentences and single line folk songs (bolis) while APPENDIX B includes more 
than 100 bolis. In both appendices, the complete corpus has been transcribed both in the 
IPA and the newly developed phonetic alphabet PUNJARPAbet. APPENDIX C 
describes the Levinson-Durbin algorithm to efficiently solve the Normal Equations 
resulting from the Autocorrelation Method for the linear prediction analysis. APPENDIX 
D includes the SIFT algorithm for computing the V/UV decision and the pitch extraction. 
APPENDIX E lists twenty-five Punjabi speech sentences synthesized in the present 
study. 


The thesis concludes with the Bibliography section. The Bibliography section 
has been organized in five groups as follows: 


1. All Books and Journals [1-130] 

2. All Dictionaries and Encyclopaedias [131-140] 

3. Internet Websites [141-160] 

4. All Punjabi References (alphabetical order in English) [1-78] 
5. All Punjabi References (alphabetical order in Punjabi) [1-78] 


The references in group 1, 2, and 4 are listed in the alphabetical order based on 
the last name of the first author according to the English language (Roman alphabet). The 
internet websites (group 3 above) are listed in the order they are cited in the thesis. The 
fifth group consisting of all Punjabi references is listed in the alphabetical order based on 
the last name of the first author according to the Punjabi language (Gurmukhi alphabet). 
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CHAPTER II 
LINGUISTICS AND PUNJARPAbet' 


In this chapter, a new phonetic alphabet consistent with the ARPAbet phonetic 
transcription or coding of the corpora in the Punjabi language has been developed. A 
number of coding schemes have been used for international as well as Indian languages 
in literature. The need for a new coding scheme becomes obvious when we investigate 
the existing coding schemes such as IPA, ARPAbet, ISCII, SAMPA (and its extended 
versions X-SAMPA and SAMPROSA), INSROT and wx-Roman. The laborious, 
irritating, and time-consuming necessities for dealing with the special symbols for 
vowels, nasalization, tones, and inserting diacritical marks in most of these schemes 
(especially where two diacritical marks over the ten vowel signs are needed for the 
Punjabi language in Gurmukhi script) confirm the need for a new coding scheme which 
has been designed in this chapter. 


Before designing the new scheme called PUNJARPAbet, we need to study 
several essential concepts leading to our objective. These concepts include linguistics, 
phonetics, Punjabi language and Gurmukhi script, speech sounds, Punjabi speech sounds, 
and the existing coding schemes. 


2.1. INTRODUCTION 


The study of the rules of a language, which govern the arrangement of the sounds 
or symbols, is the domain of Linguistics. The speech signals are constituted of a sequence 
of distinctive sounds known as phonemes. Most languages are described and studied in 
terms of phonemes. Consequently, the study and the classification of the speech sounds is 
known as Phonetics. Phonetics of a language can be studied in number of ways. Each 
phoneme represents the smallest unit of speech information, and is_ therefore, 
characterized by a unique combination of many underlying speech feature values. For the 
purpose of digital speech processing of a language, it is sufficient to study the following 
three most important features of the phonemes: the place of articulation, waveforms, and 
spectrographic characterization. The first feature is studied in this chapter, whereas the 
remaining two features are studied in Chapter VI. 


Before we study the phonetics with reference to the Punjabi language, it is 
appropriate to study the phonetics with reference to the English language. It is interesting 
to note that as a particular language grows, the number of phonemes in that language also 





' The work reported in this chapter has been published in Journal of Circuits, Systems, and Computers (JCSC), Volume 23, No. 5, 
June 2014 (SCI). 
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keep on growing. This simple and interesting fact will be confirmed as we study the 
phonetics with reference to the English and the Punjabi language. 


The first issue for the Punjabi speech processing is the actual representation of the 
phonemes. In linguistics, the International Phonetic Alphabet (IPA) designed by 
International Phonetic Association is used to represent the phonemes. However, the need 
for some special symbols that are not readily available on computer keyboards is the 
major limitation of this representation [15, pp. 801]. 


To simplify this problem, ARPAbet representation came into existence as a result 
of the ARPA SUR project [15, pp. 527-529]. A website [148] describes ARPAbet as 
follows: “ARPAbet is a phonetic transcription code developed by Advanced Research 
Projects Agency (ARPA) as a part of their Speech Understanding Project (1971-1976). It 
represents each phoneme of General American English with a distinct sequence of ASCII 
characters. ARPAbet has been used in several speech synthesizers, like SAM for the 
Commodore 64, SAY for the Amiga and TextAssist for the PC... It is also used in the 
CMU Pronouncing Dictionary. In ARPAbet, every phoneme is represented by one or two 
capital letters. Digits are used as stress indicators and are placed at the end of the stressed 
syllabic vowel. Punctuation marks are used like in the written language, to represent 
intonation changes at the end of clauses and sentences.” There are three stress values: 0, 


99 66. 


1, and 2 signifying “no stress”, “primary stress”, and “secondary stress” respectively. 


2.2. PHONETICS 


Rabiner and Schafer (1978) state that: “for American English, there are about 42 
phonemes including vowels, diphthongs, semivowels and consonants.” However, in the 
Table 3.1 [88, pp. 43], they list only 41 phonemes (11 Vowels, 6 Diphthongs, 4 
Semivowels, 20 Consonants). 


Rabiner and Schafer in their second book (2007) state that: “the number of 
phonemes depends upon the language and the refinement of the analysis. For most 
languages the numbers of phonemes is between 32 and 64.” The authors have presented 
a set of 39 phonemes [87, Table 2.1, pp. 18-19] which is used in the CMU Pronouncing 
Dictionary [149] under the title “Condensed list of ARPAbet phonetic symbols for North 
American English.” 


The same authors in their third book (2011) state that “for American English, 
there are somewhere between 39 and 48 phonemes” and have given a standard list of 48 
phonemes [89, Table 3.3, pp 87] under the title “Condensed list of phonetic symbols for 
American English.” This Table is reproduced here as Table 2.1. This Table is very 
valuable because it includes the IPA representation, and the ARPAbet representation, 
along with an example of a word where the phoneme appears. This table, thus, represents 
the latest approach of representing phonemes in the speech processing literature. 


14 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 




















































































































Table 2.1: Condensed list of phonetic symbols for American English 
IPA ARPAbet | Example IPA ARPAbet Example 
Phoneme Phoneme 
fif IY beet /n/ NX sing 
/V/ IH bit /p/ P pat 
fo AXR butter /t/ T ten 
/é/, /e/ EH bet /k/ K kit 
lel AE bat /b/ B bet 
/al AA Bob /d/ D debt 
/al AH but /g/ G get 
lol AO bought /h/ HH hat 
/o/ OW boat /f/ F fat 
lol UH book /0/ TH thing 
ful UW boot /s/ S sat 
/o/ AX about /f/, /sh/, /8/ SH shut 
[if IX roses Iv/ Vv vat 
/3/ ER bird /d/ DH that 
/e/ EY bait /z/ Z ZOO 
Ja*/ AW down /3/, /zh/, /Z/ ZH azure 
/a’/ AY buy /tf/, /€/ CH church 
/9¥/ OY boy /d3/, // JH judge 
lyl Y you [ml WH which 
/w/ W wit /I/ EL battle 
/r/ R rent /m/ EM bottom 
// L let /n/ EN button 
/m/ M met /T/ Dx batter 
/n/ N net /?/ Q (glottal stop) 
































We have used the similar approach in representing the Punjabi speech sounds in 
the next sections (Table 2.2). 


It is a well-known fact that the vowels have the longest duration in natural speech 
[89, pp. 89]. It is interesting to note that although the vowels (the most well-defined 
sounds of a language) play a decisively major role in speech (the spoken language), it is 
the consonants that contain a lot more information in the written version of the speech 
sentence. The vowels convey very small linguistic information as far as the orthography 
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of the spoken sentence is concerned. Consider two sentences, one with all vowels 
removed: 


ag? 
and the other with all consonants removed: 
S.- A 
ct a? 
It will be much easier for a native speaker to reproduce the complete original sentence: 


ast t ? 


in the first case as compared with the second case. 


2.3. THE PUNJABI LANGUAGE AND GURMUKHI SCRIPT 


In the present work, we have developed a new phonetic alphabet for the Punjabi 
language in the Gurmukhi script. Therefore, before discussing the design methodology of 
the new alphabet, it makes sense to develop the basic understanding of the Punjabi 
language and its Gurmukhi script. 


The Punjabi alphabet in the Gurmukhi script consists of 35 letters (because thirty- 
five in Punjabi is known as ust (péti), therefore, the Gurmukhi alphabet is also known as 


péti), and 6 letters with dot (<), known as feet (bidi ), below a letter symbol. Therefore, at 


present, there are 35 + 6 = 41 letters in the Gurmukhi alphabet. The complete alphabet is 
given below: 


eaEn er (US 
e@weATJ 





FHT Ws 
venve 
COsVvE 
Sarde 
ucagy 
WISe7 
Aya ssS 


Out of these 41 letters, the first three letters (8, “f, ©) are known as swar or 


vowel forms or vowel consonants or semi consonants or matra vahak and the remaining 
38 letters are known as v/ajans (consonants). There are ten vowel symbols. A vowel 
symbol is known as a laga/matra (or just matra) or a diacritical mark or an accessory 
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sign or a vowel sign. These vowel symbols also include mukta (no symbol). Using these 
ten vowel symbols (matras), the first three letters (8, ™f, ©) of the Gurmukhi script 
generate the following ten non-nasalized vowels in the Punjabi language: 


8, 8, Sul wr or whe & 


In other words, the first three letters use or ‘carry’ or ‘drive’ the matras to 
generate vowels. Vahak means a driver or a carrier. That is why the first three letters 
(ura, era, and iri) are known as matra vahak or vowel forms [66]. There are two symbols, 


feet (<} = bidi) and feut (<? = tppi) for nasalization of sounds. Using these two symbols, 
the first three letters generate the following ten nasalized vowels in the Punjabi 
language: 


6.8, Gul oh wi ol ef at 


There is one symbol («: = ddak) for the reduplication of the sound of a consonant. 
The reduplication produces the sound of Jong consonants. According to Bahri [13, pp. 5], 
“Long (or double) consonants have an overhead crescent sign, called adhak, before them” 
as in the words HY (means snake), MO" (means half), and U3St (means leaf). 


After the first three letters (@, “f, ©), the next two consonants (H, J) are the root 


class consonants. Out of the remaining 36 consonants, the next 25 letters are grouped into 
five folis (groups) of five letters each (kavarg, cavarg, tavarg, tavarg, and pavarg toli), 
according to their overall phonetic characteristics (the place of origin, pronunciation, 
participating articulators: lips, jaws, lungs, mouth, nose, tongue, palate, and throat). The 
next group of five letters (yaya, rara, lolla, vava, rara) is called the d¢/m (means last) foli. 
The last group of six consonants is a special group with a dot (b/di) below the letter. It is 


called the navin (means new) foli. The first five of these consonants (H ¥ dSI Ht &) were 


introduced in the previous century mainly to accommodate the words from the languages 
such as English, Sanskrit, Arabic and Persian, while the sixth and the last consonant (®) 


is an original sound from the Punjabi culture. It is important to note that the existence of 
the phoneme [®] = [I] has been mentioned in literature long before 70s of the previous 
century, but the character % was officially introduced in 1979 in the 2™ edition of 


Punjabi Praveshka (the text book for Grade I students) published by the Punjab School 
Education Board in 1979 [84, pp. 15]. The 1* edition of Punjabi Praveshka published in 


1978 [pp. 15] had only the first five of these consonants (7 ¥ ST rt &) of the navin foli. In 


other words, the first-ever batch of the Grade I students to officially learn the existence of 


Ne 
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% was the batch starting Grade I in the year 1979. To the best of our knowledge, this fact 


(verified by the author with considerable effort in finding an old Grade I text) is being 
stated in any document in print for the first time. 


There are three conjunct consonants (or half-letters): h, r, v (J, J, <). Lot more 


conjunct consonants are spoken, but not written [13, pp. 5] at present. The most authentic 
example is Guru Granth Sahib (The Holy Manuscript of Sikhism), compiled by the fifth 
guru (Guru Arjun Dev) of Sikhs in 1604, which has at least the following four more 


conjunct consonants: c, t, t, n (J, ¢, 3, 6). Guru Granth Sahib includes the writings of 


three dozen spiritual poets (six gurus who lived between 1469 and 1675, fifteen bhagats 
who lived before gurus, and fifteen contemporaries of different gurus). The description of 
all of the thirty-five letters of the Gurmukhi script as we know today can be found in print 
in Guru Granth Sahib in the writings of the first guru (Guru Nanak Dev), and Bhagat 
Kabir (The word guru means teacher, and the word bhagat means meditator). 


2.4. SPEECH SOUNDS 


The details of the Punjabi language and Gurmukhi script described in the previous 
section, concepts of speech sounds described in this section, and the classification of the 
Punjabi speech sounds described in the next sections are mainly based on the pioneering 
works referenced in Jain [53], Arun [10], Gill [43], Gill and Gleason, Jr. [42,46], Sandhu 
[95-97], Joga Singh [43, 84], Dulai and Koul [34], Bahri [13], Agnihotri [1], Joshi 
[43,56-57], Harkirat Singh [43, 110-111, 139], Tej Bhatia [18-19], Bhardwaj [16-17] 
Prem Prakash Singh [43, 116-117], and Surjeet Singh [43, 119]. 


Speech sounds in a language are generally classified in two broad categories: 
segmental and suprasegmental. For the purpose of speech analysis and synthesis, we 
need to understand both: segmental sounds and suprasegmental sounds as described 
below: 


2.4.1. Segmental Sounds 
The Segmental sounds are further divided into vowels and consonants. 


2.4.1.1. Vowels 


Sandhu [95, pp. 35] summarizes the production of vowels as follows: “For the 
production of all the vowels, vocal cords vibrate and in the Panjabi language, there are no 
voiceless vowels like sounds.” While producing vowels, the air stream coming from the 
lungs passes through the oral cavity without any obstruction. Different parts of the tongue 
move to different heights within the oral cavity; and the shape of the lips is modified. The 
movement of the different parts of the tongue (like front, central or back), the shape of 
the lips, and the heights to which a specific part of the tongue is raised play a decisive 
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role in the production of different vowels. In the production of vowels, vocal cords may 
vibrate to produce the voiced vowels. The nasal passage remains closed when the non- 
nasal or oral vowels are produced and it remains open allowing the air stream to pass 
through the nasal cavity thus producing nasalized vowels [34]. 


Besides a broad classification of vowels into nasal and non-nasal categories, the 
vowel sounds can be classified on the basis of the part of the tongue used in the 
articulation, the height of the tongue, and the position of the lips [34]. 


Based on the part of the tongue used in the process of articulation, vowels are 
classified as front vowels, central vowels and back vowels. On the basis of the height of 
the tongue in the articulation of vowels, the vowels are classified as high, lower-high, 
mid, mid-low and low vowels. Based on the posture of lips during the production of 
vowels, vowels can be classified as rounded and unrounded vowels, or Short and long 
vowels. 


2.4.1.2. Consonants 


In the production of the consonants the air current coming from the lungs is 
stopped or impeded at different points of articulation (POI). The shape of the chambers 
through which the air passes is also modified accordingly. In the broad field of Phonetics, 
consonants have been classified as stops, affricates, nasals, laterals, trills, flaps, 
fricatives, and semi-vowels based on Manner of Articulation (MOA) as described below: 


Air stream coming from the lungs is blocked at some point or other in the oral 
cavity and the nasal passage gets closed while producing stops. The contact is released 
suddenly with Explosion. For example, [p] and [b] are known as plosives. Stops are 
sometimes produced by sucking the air in. As Punjabi stops are not implosives, so the 
process of articulation for implosives is not discussed in this work [34]. 


The air stream coming from the lungs is stopped in the oral cavity, but the contact 
is withdrawn slowly causing friction when the air escapes in the production of affricates. 
Its examples are: [c], and [j]. 


On producing nasals, the air stream coming from the lungs is directed into nasal 
cavity by lowering the soft palate. When the contact of the articulator is released the air 
passes through the nasal cavity, the oral cavity is in most of the cases fully closed. Its 
examples are: [m], and [n]. 


While producing laterals, the air stream coming from the lungs is blocked in the 
middle of mouth by raising the mid surface of the tongue and then the air is allowed to 
escape from the sides of the tongue. An example is: [I]. 
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A rapid vibration of the tip of the tongue against the teeth ridge occurs while 
producing trills. [r] is one of its example. 


The tip of the tongue approaches the hard palate but does not make a contact in 
the production of flaps. The air escapes between the tip of the tongue and palate. One of 
its examples is: [r]. 


As far as the production of fricatives is concerned, the air stream is pushed 
through a narrow opening which causes friction. Two examples are: [f], and [s]. 


While producing semi-vowels (also known as frictionless continuants), the 
position of the speech organs remains same as in the case of fricatives but there is a 
weaker breath force so that no audible friction is heard. Two of its examples are: [y], and 


[v]. 


Based on the points of articulation (POA), consonants can be classified as 
bilabials, dentals, alveolars, retroflexs, palato-alveolars, palatals, velars, and glottals. 
While producing various speech sounds just mentioned here, the air flow coming from 
the lungs is checked or impeded at these points [34]. 


2.4.2. Suprasegmental Sounds 

The Suprasegmental or nonsegmental Punjabi sounds can be understood in terms 
of the following features: nasality, gemination, stress, vowel-length, duplication, three 
tones (low, mid, high), five intonations (falling pitch, high rising pitch, high level pitch, 
low rising pitch, mid level pitch), style (e.g., formal, informal), tempo (e.g., slow, fast), 
juncture, and three conjunct consonants (h, r, v). 
2.5. THE PUNJABI SPEECH SOUNDS 
2.5.1. Punjabi Alphabet 


The complete Punjabi alphabet is reproduced here once again for the sake of 
completeness, and to maintain continuity: 


SEH Ss (USN) 
e@unzHg 





FHT Ws 
vsexnyve 
COsVvE 
Sarde 
ucagy 
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UISexr 
AaTInss 


The segmental speech sounds of the Punjabi language have been classified in two 
broad categories of vowel phonemes and consonant phonemes as outlined in the next two 
sub-sections. 


2.5.2. Vowel phonemes 
(i) Non-nasalized vowels: 

front unrounded vowels: 
li], [I], [el], [el] 

central vowels: 
[A], [a] 

back rounded vowels: 
[u], [U], [o], [O] 


(ii) Nasalized Vowels: 


Ten non-nasalized vowels described above are present in nasalized form also: 


(iJ, (1, (5), (4, (2), (€], fH, (01, (6), [5] 


These ten vowels have also been classified as short and long vowels as 
follows: 


Short vowels: 


Long vowels: 


[i], fe], [e], [a], [uv], [o], [0] 


2.5.3. Consonant Phonemes 


Stops: 
[M]=[p], [S]=[ph], [&] =[b] 
[3] =[t], [S)=[th], [=] =[d] 
[<]=[tl, [6]=[th], [8] =[d] 
[4] =[k], [4] =[kh], [ST] =[g 
Affricates: 


[J] =[c], [S] = [ch], [A] = [9] 
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Nasals: 

[H] = [m], [4] = [n], [&] = [n] 
Laterals: 

[3] = [1], [8] = (1 
trill: 


Flap: 


Fricatives: 


[A] =[s], [A] =[8], [A] =[z], [s] = [fl], 
[4] =[x], [sl] =[g], [9] = [h] 


Semi-vowels: 


[4] = ly], [=] =[v] 


The results of the discussion regarding the English speech sounds, and the Punjabi 
speech sounds can be summarized in Tables 2.2 and 2.3 (based on Rabiner and Schafer 
[87-89], and Dulai and Koul [34] respectively). It should be noted that the English 
language (Roman script) has 26 letters leading to 42-48 phonemes, whereas the Punjabi 
language (Gurmukhi script) consists of 56 items (35 letters, 6 supporting letters with a 
dot in the foot, 3 conjunct consonants (half-letters), 9 supporting symbols, 3 
reduplication/nasality symbols) leading to many phonemes by using different 


permutations and combinations. The following sounds, and phonemes (e.g., nasals: 3, &, 
©; tonemes: 4, ¥, B, U, 3; stops: 3, 4; flap: 3; lateral: %) do not exist in the English 
language (Phonemes involving tones in a language are known as Tonemes). Several other 
sounds with conjunct consonant d (e.g., 9, 4, J, 3, ZX) also don’t exist in the English 
speech sounds (see Table 2.5 for details). Tables 2.4 and 2.5 are based on Sandhu [97], 
and Bahri [13] respectively. These tables have been added here to confirm the simple fact 
that no two authors in the Punjabi language agree on the classification of the Punjabi 
consonants and Punjabi speech sounds. These important observations make the Speech 


Processing projects in the Punjabi language much more different and challenging 
than the similar projects in the English language. 
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Table 2.2: ENGLISH SPEECH SOUNDS (based on Rabiner and Schafer [87-89]) 
[26 letters, 41 phonemes: 42(1978), 39(2007), 48(2011)] 


1. Vowel phonemes (11) 


Front vowels: (TY], [I], [E], [AE] 
Mid vowels: [A], [ER], [UH], [OW] 
Back vowels: [OO], [U], [O] 


2. Diphthongs(6) 
[AI], [OI], [AU], [EI], [OU], [JU] 
3. Semi-vowels (4) 


Liquids: [R], [L] 
Glides: [W], [Y] 
4. Consonants (6+2+3+8+41=20) 
Stops: voiced: [B], [D], [G] unvoiced: [P], [T], [K] 
Affricates: [DZH], [TSH] 
Nasals: [M], [N], [NX] 
Fricatives: voiced: [V], [TH], [Z], [ZH] 
unvoiced: [F], [THE], [S], [SH] 
Whisper: [H] 


Table 2.3: PUNJABI SPEECH SOUNDS (based on Dulai and Koul [34]) 
[56 items: 35 letters, 6 supporting letters, 3 half-letters, 
9 supporting symbols, 3 gemination/nasality symbols] 
1. Vowels 
Short vowels: [I], [U], [A] 
Long vowels: [i], [e], [e], [a], [u], [o], [O] 
2. Consonants 
Stops: [p], [ph], [b] 
[t], [th], [d] 
[t], [th], [d] 
[k], [kh], [g] 
Affricates: [c], [ch], [j] 


Nasals: [m], [n], [n] 
Laterals: HH, (J 
trill: [r] 

Flap: Ir] 


Fricatives: —[s], [8], [z], [fl], [x], [gl], [hl 
Semi-vowels: [y], [v] 
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Table 2.4: Consonantal Phonemes (Sandhu [97], pp. 8) 
































Labial | Dental | Retroflex | Palatal | Velar 
Voiceless f t & ie 
unaspirated stops P j 
Voiceless 
aspirated stops pe ie th on " 
Voiced 
unaspirated stops P : d J g 
Nasals m n n 
Laterals ] 
Trills r r 
Fricatives S s’ h 
Semi vowels Ww y 























Table 2.5: Panjabi Consonants (Bahri [13], pp. 11) 



































3. Nasal i 5 a % 
1. Stops 2. Affricate Stops % Fricative a 6. Rolled x Semi- 
Stops ‘ Laterals Flapped | yowels 
VLIV > VL Vv VL Vv Vv VL Vv Vv VL Vv 
U/A> U A U A u fA lu A U A U A U A U A U A U 
Velars ay) yu | wT | we =] J 
Palatals qq 
Palato- glelaly.| ze FH 
alveolars 
Cerebrals | C | O | 3 | Wx = 3/3 
Alveolars S| o* | A B | S| J | gx 
Dentals | 3 | 4 | © | U* 
Labio- 
dentals 
Labials yY;}|ec|a| gs H | Ye z 




































































VL = Voiceless (aka Unvoiced), V = Voiced; U = Unaspirated, A = Aspirated 
* Voiced aspirates 
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2.6. PHONETIC CODING SCHEMES 


At present, a major problem in the area of Punjabi speech processing is that the 
symbols consistent with the ARPAbet phonetic transcription do not exist in the Punjabi 
Language. Consequently, different authors are using different notations. 


In the present work, a new phonetic alphabet consistent with the ARPAbet 
phonetic transcription/coding of the corpora in the Punjabi language has been developed 
in Table 2.7 (objective 1). A number of coding schemes have been used in literature for 
different international languages [130] as well as the official languages of India [141]. 
The need for a new coding scheme for the Punjabi language becomes obvious when we 
investigate the existing coding schemes. Some of the major phonetic coding schemes are 
briefly discussed below. 


2.6.1. International Phonetic Association: IPA (1888) 


The International Phonetic Alphabet (IPA) was created in 1888 by a group of 
prominent European phoneticians as an effort to facilitate and standardize language 
transcription [The International Phonetic Alphabet (IPA): 1989 (Das Mandal), 1886 
(LSD]. Even today, the IPA is most widely accepted and used [24, 9]. Because there are 
enough entries to handle phonemes in almost all the languages of the world, the IPA is 
considered most appropriate for handwritten work. However, the following problems 
arise when one tries to use IPA for the computerized speech processing [Deller et al., 24 


pp. 117]: 


1. The IPA cannot be typed on a “conventional typewriter or a computer keyboard.” 

2. Different books make use of the different symbols or notations for the elements 
implying the possible difference in notation even within the IPA. As an example, see 
Table 2.6, where three different books use three different notations. 


2.6.2. Advanced Research Projects Agency (ARPAbet) 


As a solution to this problem, the United States Advanced Research Projects Agency 
(ARPA) developed the phonetic alphabet known as ARPAbet, with two versions (Single- 
letter version and upper-case version) suitable for typing: 


1. Single-letter notation: This notation uses single-letter symbols. It is a mixture of 
lower case letters, upper-case letters, and non-alphabet symbols [24]. 

2. Upper-case notation: This notation uses upper-case symbols only. Inevitably, it uses 
some double-letter designators [24, 89]. 


At present, a major problem in the area of Punjabi speech processing is that the 
symbols consistent with the ARPAbet phonetic transcription do not exist in the Punjabi 
language. 
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2.6.3. Indian Script Code for Information Interchange (ISCI]) 


ISCII-88 is an 8-bit code slowly developed by the year 1988 focusses on the 
bilingual nature (one Indian script and the Roman script being used by ASCII) of the 
linguistic scenario in India [128]. The earlier versions of ISCII were 7-bit codes. ISCII-88 
is an 8-bit code because 7-bits are required to represent each of the two code sets (7-bit 
ISCH code and 7-bit ASCII code). After about 3 years work, the Center for Development 
of Advanced Computing (C-DAC) modified ISCII-88 and proposed an improved version 
known as ISCII-91 as the national standard for Indic scripts [120]. This document was 
accepted by the Bureau of Indian Standards (BIS) and published by BIS in 1991 as 
ISCH-91 CS 13194:1991). An eleven column table consisting of the eleven scripts and 
the corresponding languages ISCII-91 can handle is given as Annexure-1 in [120, pp. 
320-323]. The contents of the several pages long table are summarized below. The 
languages that make use of these eleven scripts are given in brackets (e.g., Devanagari or 
DEV script is employed by nine languages as described in item 2. below): 


1. RMN = Roman (English) 

2. DEV = Devanagari (Dogri, Hindi, Kashmiri, Konkani, Maithili, Marathi, Nepali, 
Sanskrit, Sindhi) 

PNJ = Gurmukhi (Punjabi) 

GJR = Gujarati (Gujarati) 

ORI = Oriya (Oriya) 

BNG = Bangla (Bengali) 

ASM = Assamia (Assamese/Axomiya, Bodo) 

TLG = Telugu (Telugu) 

. KND = Kannada (Kannada) 

10. MLM = Malayalam (Malayalam) 

11. TML = Tamil (Tamil) 


CHNIAARYW 


It should be noted that Sindhi and Kashmiri languages can be written in Devanagari 
as well as Urdu scripts. ISCII code is considered suitable for the coding of most of the 
Indian languages and Indic Scripts due to its capabilities to represent matras, vowels, 
consonants, numerals, and diacritical marks. That is one of the major reasons why 
Unicode representation of different coding sets for representing Indic Scripts is based on 
ISCH-88. 


2.6.4. Speech Assessment Methods Phonetic Alphabet (SAMPA) 


SAMPA is an ASCII coding scheme particularly developed for European languages 
as well as for some Asian languages. It is a machine readable phonetic alphabet. 
Consequently it has been successfully used for many speech processing applications [8- 
9]. X-SAMPA is a variation of SAMPA to cover the entire IPA. SAMPROSA is a 
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parallel system of prosodic notation (keeps prosodic and segmental transcriptions distinct 
from one another). SAMPROSA is a limited scope scheme with its scope limited only to 
the phonemic transcription. Both X-SAMPA and SAMPROSA are considered the 
extended versions of SAMPA [9]. 


2.6.5. Indian Script to Roman Transliteration (INSROT) 

INSROT is a limited scope coding scheme. It was designed for the transcription of 
Indian language contents in the Roman alphabet [8]. INSROT uses lower case letters 
only. Consequently, it is not case-sensitive so as to support the case-insensitive search. 


2.6.6. wx-Roman 

This scheme is a limited-objective (or special purpose) scheme designed to 
transliterate Devanagari to Roman script. The single most important salient feature of 
wx-Roman is, that it makes use of a single keystroke for each consonant or vowel in the 
Devanagari script [8]. SAMPA, INSROT and wx-Roman have been employed to 
represent the Hindi and Punjabi languages. Arora et al. [8] have attempted to make use of 
SAMPROSA, the extended version of SAMPA, in order to represent the tonal variations 
in the Punjabi language because a prosodic notation is required to represent the special 
tonal features in the Punjabi language. However, they have concluded that “the five 
voiced aspirates are firm in Hindi but are absent in Punjabi”. The same limitation 


becomes more obvious in Annexure I [9] where five Punjabi tonemes (Y, 8, @, 4, 3) are 


missing. 


It is clear that each of the six coding schemes discussed above has its merits, but none 
of these have proven suitable for the phonetic representation of the Punjabi language 
because of the following reasons: 


1. These schemes were designed with different foci, different languages, and different 
objectives in mind. 

2. Most of these schemes cannot be typed on a conventional typewriter or a computer 
keyboard. 

3. Most of these are inconsistent (different books make use of the different symbols or 
notations for the elements implying the possible difference in notation even within the 
IPA). 

4. Some of these coding schemes are limited-objective schemes only. 

5. The laborious, irritating, and time-consuming necessities for dealing with the special 
symbols for vowels, nasalization, tones, and inserting diacritical marks in most of 
these schemes (especially where two diacritical marks over the ten vowel signs are 
needed for the Punjabi language in Gurmukhi script) further confirm the need for a 
new coding scheme as designed in this chapter. Tables 2.6 and 2.7 provide a different 
insight into the necessity of the new coding scheme. 
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Table 2.6: Condensed list of phonetic symbols for Punjabi 


Symbol GILL and GLEASON, Jr.[43] DULAIT and KOUL[34] BAHRI[13] 








































































































= 
e ura - u oora 
a era : a Hr 
z iti 2 i iri 
A s sassa S sassa s sassa 
J h haha h haha h haha 
a k kokka k kokka k kakka 
y kh khokkha kh khokkha kh khakkha 
a g gogga g gogga g gagga 
al k, 8,8 kagga k, 68 kagga gh kasha 
2 n nona - - A hana 
J € cacta c cacca ch chachcha 
S ch choétha ch chaccha chh chhachha 
s J Jeija j jeiia J jajja 
8 ELi ija ELI cBija jh cajha 
= fi flofifia a nona fi flafid 
¢ t téka t téka t tZinka 
So th thottha th thottha th thaththa 
3 d dodda d doda d dadda 
u Edd tadda i.dd (Adda dh tadha 
= n nana n nana n nana 
3 t totta t totta t Tatta 
q th thottha th thatha th thaththé 
= d dodda d dadda d dadda 
qT i£dd tadda Edd t@dda dh tadha 
6 n nonna n nonna n nanna 
a P poppa P popa p pappa 
Ss ph phoppha ph phoppha ph phaphpha 
a b babba b babba b babba 
3 pbb pabba pbb pabba bh pabha 
4 m momma m momma m mamma 
a y yoya y yeya y yayya 
g r rara Tr rara r rara 
ro) 1 lolla 1 lolla 1 lalla 
= w wawa Vv vava Vv vava 
z t rara r rara r rar 
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Table 2.6 (Continued) 




















A 8 8 Sas8a sh 
4 kh x Xoxxa kh 
a gh g gogga G 
WT Zz Zz ZOZzZa 

Ss f f foffa f 

S ] lolla 














2.7. NEW PHONETIC ALPHABET DEVELOPMENT: PUNJARPAbet 


As pointed out in earlier sections 2.1 and 2.2 of this chapter, a major problem in the 
area of Punjabi speech processing is that at present, the symbols consistent with the 
ARPAbet phonetic transcription do not exist for the Gurmukhi alphabet in the Punjabi 
Language. Consequently, different authors are using different notations. For example, see 
Table 2.6, where three different books by Gill and Gleason, Jr. [43], Dulai and Koul [34], 
and Bahri [13] use different notations. ARPAbet notation has gained popularity amongst 
computerized speech researchers due to simplicity and the credibility of the agency 
ARPA itself. Therefore, in this chapter, we have concentrated on developing a notation 
based on ARPAbet. 


In this section, a new phonetic alphabet (invariably called coding scheme or new 
scheme or ‘just’ scheme hereafter) has been designed to encode the Punjabi corpus. Since 
the new scheme is based on the ARPAbet, let us call it Punjabi ARPAbet to begin with. 
Combining PUNJ and ARPAbet, we derive the name PUNJARPAbet same way as the 
name of the PUNJAB state has been derived from the combination of the two words 
PUNJ (five) + AAB (water). The name Punjab literally translates to five waters [150]; 
representing the fact the undivided Punjab state (before partition in 1947) was the land of 
the five rivers: Jnana (Chenab), Jhelum, Ravi, Beas, Sutlej. The PUNJARPAbet is an all 
upper-case coding scheme. It is consistent with the all upper-case version of the famous 
ARPAbet scheme. The existing schemes (e.g., IPA) almost always require special 
symbols. It is not an easy task to find these symbols, or to combine the appropriate 
diacritical marks to these symbols. While transferring data consisting of these symbols 
from one computer to another, it is not unusual to face the portability problems. 
Consequently, dealing with these symbols is laborious, irritating, and time-consuming. 


However, the new coding scheme designed in this work is very easy to follow and is 
the most suitable scheme for the ordinary computer keyboard as well as an ordinary 
typewriter. Table 2.7 summarizes the new coding scheme. Table 2.7 consists of six 
columns and 58 rows. Ten non-nasalized vowels, ten nasalized vowels, and 38 
consonants add up to 58 and each item occupies one row. Column | has 58 Gurmukhi 
symbols. Column 2 has corresponding IPA symbols. Column 3 has two sub-columns for 
ARPAbet: one each for Single Symbol Version and Upper Case Version. Column 4 has 
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corresponding SAMPA symbols. Column 5 consists of the newly designed 
PUNJARPAbet symbols: the original contribution of this work. Column 6 has two sub- 
columns for examples in English and Punjabi. The sixteen blank boxes in two sub- 
columns under ARPAbet in Table 2.7 mean that at present there are no symbols in the 
ARPAbet to represent these sixteen Punjabi speech sounds (and the corresponding letters 
of the Gurmukhi script). These sixteen Punjabi speech sounds/Gurmukhi letters are: 


i= | es I Cf 
kh, x, g, k/g, ch, c/j, p/h, th, t/d, n, t, t/d, ph, p/b, Lr 


In this work, new symbols have been designed for the above 16 Punjabi speech sounds to 
transcribe the newly designed text and speech corpus. These symbols can be found in 
column 5 of Table 2.7. A complete template of PUNJARPAbet including newly 
designed symbols is given in Table 2.8. 


PUNJARPAbet is a coding scheme that uses only uppercase English letters to 
represent the Punjabi speech sounds. Hence the PUNJARPAbet is consistent with the all 
uppercase version of the famous ARPAbet scheme. It uses three lowercase letters n, h 
and / to represent nasalization, high tone and low tone respectively. The other two 
symbols used in the PUNJARPAbet to represent the following seven Punjabi speech 


sounds or phonemes (¢, 6, 3, UZ, =, dT, S) are underline ( _ ) and colon (: ). The new 


scheme is very easy to use since all the symbols used in this scheme are readily available 
on an ordinary computer keyboard as well as on an ordinary typewriter. Consequently, 
the PUNJARPAbet is the most suitable coding scheme for transcribing the Punjabi 
language speech and text corpora. The evaluation of this scheme requires an appropriate 
corpus. This corpus is designed in Chapter IV, and the PUNJARPAbet coding scheme is 
thoroughly evaluated in Chapter VI. The complete corpus using the newly designed 
ARPAbet-compatible Phonetic Coding Scheme PUNJARPAbet is given in the 
Appendices C and D. 


Table 2.7: New Phonetic Coding Scheme for the GURMUKHI Script 





























Examples 
ARPAbet 
IPA PUNJAR- 
Gurmukhi | Symbol | Single Upper | SAMPA | pabet 

Symbol Case English Punjabi 

Version Version 
wf o/a A AH a AH un Wo, WAT 
wr a a AA a: AA all WS, WT 
fe I I IH I IH it fre, fog 
ci i i IY i IY eat de, ug 
8 U U UH U UH push YA, Ad 
g u u UW u UW tool 5, AT 
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Table 2.7 (Continued) 





































































































e e € EY e EY ate te, Ad 
at ele @ AE E AE at ule, Aa 
=f (0) oO OW (0) OW oat Ge, Ag 
wf 3 c AO re) AO odd nig, Hd 
nA a/Aa A AH a AHn under nizd, WZ 
wt, Ar a a AA a AAn auntie niet, we 
fe fH i I IH I THn ink fea, fo 
at, AT i i IY i TYn wet, Se 
g.8 U U UH U UHn Gas, Hed 
cash a it UW a UWn Gy, de 
aA é e EY e EYn saint Ae, de 
ut A ES @ AE E AEn and ul, daz 
g, A 6 (0) OW (0) OWn Gard, de 
ui A 5 c AO O AOn A, Age 
A S S Ss s S seat Ate, AST 
w s/f S SH S SH sheet Hite, FAS 
J h h HH h H heart Jde, fags 
a k k K k K keep atu, as" 
uy kh k_h KH Yet, dT 
H x xX x xuda yer, WS 
I g g G g G great , SSH 
a g G G asl, JOH 
Wy k/g gh GH wd, AY 
] n/n G NX N NX sing Hiss, ase 
J c/tf Cc CH tS CH church Jdd, wd 
s ch tS_h CHH ost, es 
a ji ds J JH dZ J jeep rity, Hts 
co Zz Z Z Zz Z zero mtd, Fa 
e/j dZ_h JH Bo, va 
p/it J NJ de, SE 
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Table 2.7 (Continued) 










































































¢ t/t t T t T: team ay, ht 
6 th t_h TH oe, ota 
3 d/d d D d D: drama 3dH,”, 3d 
u t/d dh DH fez, da 
= n n N ve, uret 
3 t t 36, std 
a th/0 T TH th TH thing faa, a 
= d/8 D DH d D that te, wade 
a t/d d_h DH UH, Ua 
roy n n N n N neat ate, oH 
a P P Pp P para der, ute 
S ph p_h PH phone 26, cHEt 
z f f F f F five rae, A 
a b b B b B book ga, 8g 
3 p/b b_h BH ae, St 
cal m M EM m M mother Hed, Ha 
W y/j y Y j Y yard Ws, WJ 
J r r R r R rat Je, oS 
5 1 1/L L/EL l L love Be, fuer 
3 | L TS, AS 
= Vv Vv Vv Vv Vv victory feaed, fue" 
a i r RH Aad, oF 


2.8. CONCLUSION 


In this Chapter, a new phonetic coding scheme has been designed to encode the 
Punjabi corpus. Since the new scheme is based on the ARPAbet, therefore, we have 
called it Punjabi ARPABet to begin with. Combining PUNJ and ARPAbet, we have 
derived the name PUNJARPAbet in the same way as the name of the PUNJAB state has 
been derived from the combination of the two words PUNJ (five) + AAB (water). 
PUNJARPAbet is a coding scheme that uses only uppercase English letters to represent 
the Punjabi speech sounds. Hence the PUNJARPAbet is consistent with the all 
uppercase version of the famous ARPAbet scheme. 


The idea of the new scheme originated to address the four issues. Firstly, the 
existing schemes (e.g., IPA) almost always require special symbols. It is not an easy task 
to find these symbols, or to combine the appropriate diacritical marks with these symbols. 
Secondly, while transferring data consisting of these symbols from one computer to 
another, it is not unusual to face the portability problems. Consequently, dealing with 
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these symbols is laborious, irritating, and time-consuming. Thirdly, no two authors agree 
on the representation of the Punjabi speech sounds as shown in Table 2.6 consisting of 
Gill & Gleason, Bahri, and Dulai. Lastly, the symbols for at least 16 letters of the Punjabi 


language/Gurmukhi script do not exist in the ARPAbet. These 16 letters (4, 4, ST, WY, &. 
8, J, 5, cd, =, 3, YU, g, J, &, J) have been identified in red colour in Table 2.8. Table 


2.7 summarizes the new coding scheme. Table 2.8 gives the complete template of the 
new coding scheme PUNJARPAbet. The new scheme is very easy to use since all the 
symbols used in this scheme are readily available on an ordinary computer keyboard as 
well as on an ordinary typewriter. Consequently, the PUNJARPAbet is the most suitable 
coding scheme for transcribing the Punjabi language text corpora. 


Table 2.8: PUNJARPAbet TEMPLATE 
















































































yf uw |e) t) ee; ee; e/a] & | w 
AH | AA | JH} IY | UH | UW | EY | AE | OW | AO 
al al J; a y Hy a | oS wy 2 
S SH |H/} K | KH] X G | G | GH | NX 
J eS Hy) a cf q 

CH | CHH | J Z | JH | NJ 

< oO 3a | @ S 

T: | TH: | D: | DH: | N 

= gq =| a ro) 

T TH | D| DH| N 

U S s/ a g H 

P PH | F B |BH]| M 

aq J S| Ss = a 

Y R L L V | RH 

| 
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CHAPTER II 
SPEECH SIGNAL PROCESSING? 


3.1. INTRODUCTION 


In this chapter, all essential concepts required for developing the speech production 
models for linear prediction analysis and synthesis of the Punjabi speech are discussed. 
These concepts include speech signal, speech sound and speech processing, speech 
production models, and the challenges presented by the speech signal processing. The 
term “speech” conveys an entirely different meaning to different people, in different 
context and environment, and in different professional fields. It might mean speech 
related medical issues such as stuttering to a doctor or a patient, clarity of thoughts of the 
speaker conveyed to a layman or a political worker, and an acoustic waveform to a 
speech scientist or an IT professional. A speaker perceives the speech at a linguistic level 
in his/her brain, generates it via mouth and nose at the physiological level, and conveys it 
at the acoustic level. The listener’s (and speaker’s own) ears receive it at the 
physiological level before decoding and understanding it at the linguistic level. This 
phenomenon or activity, generally called the Speaker Chain [Rabiner and Schafer, 89, pp. 
5, pp. 125] is summarized in Fig. 3.1. 















SPEAKER LISTENER 
ia cath sles rca : as Ae = 
sia Brain 
Sensory 
Nerves 
aw 
Feedback 
Eee x Sensory 
Brain N\ Nerves 






Sound Waves 





Vocal 
Muscles 





ray 
— - ~ Fo ae 
/~ Linguistic ( Physiological ) C “Acoustic Poysial ogical ‘ C Linge ) 
4 Ca Zw Ne Level Vv _ a —_Level _ Level a, 


ee eae ee 


Figure 3.1: The Speaker Chain: from message, to speech signal, to understanding 
[due to P. B. Denes and E. N. Pinson (1993, 2" edition)] 





> The partial contents discussed in this chapter have been published in a book of Punjabi University (ISBN: 81-302-0177-1), 2010 
and in the International Journal PARKH, Vol. I, Jan-June 2013, pp. 179-188. Some contents of this chapter were presented in the 
international conferences including World Punjabi Conference (WPC-2009), Punjab University, India and International Punjabi 
Development Conference (IPDC-2009), Punjabi University, India. 


34 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


Using the communication systems technology, the speaker or talker is the 
transmitter and the listener is the receiver, and the speech signal is the message of 
interest. 


3.2. SPEECH SIGNAL 


The purpose of speech is communication of ideas expressible in some language. As 
Edward Sapir [116, pp. 17] said, “Speech is so familiar feature of our daily life that we 
rarely pause to define it. It seems as natural to man as walking, and only less so than 
breathing.” Mario Pei [116, pp. 17] confirms the same fact from a different angle by 
stating that the “systems of communication not based on speech, while extremely useful 
on specific occasions, are generally inferior to the spoken tongue as meaning-conveyors.” 


In the signals and systems theory, speech is represented as an acoustic pressure 
waveform, i.e., as a signal carrying the information or message. 


soft palate 
(velum) 










nasal 
cavity 


hard palate 


lips 


(a 
i ee vocal folds/glottis 


chest cavity 


Figure 3.2: Cross-sectional view of the human vocal tract mechanism 
showing some of the major articulators in speech production 
(due to Markel and Grey [76]) 


The representation of a speech signal based on its acoustic waveform, or some 
parametric model based on the waveform has been found to be most useful and helpful in 
practical applications, and in understanding the complex structure of the waveform. 
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The acoustic speech waveform is an acoustic pressure wave which originates 
from voluntary physiological movements of the major parts of anatomical structure 
involved in speech generation as shown in Fig. 3.2. 


The organs of speech involved in the production of various speech sounds [34] 
may be classified into two categories: 1. Articulators, and 2. Points of Articulation. The 
articulators are those movable organs in the speech tract, which move towards the points 
of articulation when sounds are produced. The articulators are lower lip, tip of the 
tongue, blade of the tongue, center of the tongue and back of the tongue. The points of 
articulation are relatively stationary organs of speech, such as upper lip, tip of teeth, 
ridge, hard palate, soft palate, and uvula (full name is the palatine uvula). 


The vocal cords play an important role in the production of speech sounds. The 
air stream coming from the lungs passes through the wind pipe. The Larynx is the 
uppermost part of the wind pipe. It contains two lip-like elastic membranes known as 
vocal cords. The vocal cords vibrate, when brought very near to each other and when air 
current passes through them. This gives rise to voicing. Speech sounds which are 
produced in this manner are called voiced sounds. In case the vocal cords are not brought 
near to each other and remain apart, the air current passes through them noiselessly, and 
the speech sounds thus produced are called unvoiced (or voiceless) sounds [34]. 


A schematic diagram of the human speech production mechanism (based on 
Flanagan [40]) is shown in Fig. 3.3. More specifically, speech is produced by the actions 
of nose, mouth, jaws and throat upon the entire breath system. In normal speech 
production, the chest cavity expands and contracts and forces the air from the lungs out 
through trachea past the glottis (the opening between the vocal folds is called glottis). 
Depending upon the position of the trap door velum, the air stream is expelled either 
through the mouth cavity or through the nasal cavity or both and perceived as speech. 


The vocal tract is a non-uniform acoustic tube which extends from the glottis to 
the lips and is about 17 cm long for an average adult male. Vocal tract varies in shape and 
size as a function of time. This time-varying change is caused by the continuously 
changing positions of the various articulators (the major anatomical components 
participating in speech production, e.g., lips, tongue, jaw and velum, have already been 
described in Section 2.1). As an example, the cross-sectional area of vocal tract varies 
from 0 to 20 sq. cm. depending upon whether the lips are closed, or mouth and jaws are 
wide open. 
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Figure 3.3: Schematic diagram of the Human Speech 
Production Mechanism (due to Flanagan [40]) 


3.3. SPEECH SOUNDS AND SPEECH PROCESSING 


In order to conduct speech analysis and synthesis in any language, the speech 
sounds must be understood in terms of the vocal tract system, so that an appropriate 
computer model can be developed. This section deals with the vocal tract system so that 
we can start developing the appropriate speech production model in the next section. 


The speech signals are constituted of a sequence of sounds and the sets of these 
distinctive sounds in a language are called phonemes. These sounds and the transitions 
between these sounds form the symbolic representation of the information. 
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The Punjabi speech sounds have been described in Section 2.5 where phonemes 
were classified into vowels, diphthongs, semivowels or consonants. These four main 
classes are further broken down to sub-classes depending upon the manner and place of 
articulation of the sounds within the vocal tract. 


There are three primary modes for exciting the vocal tract system. Based on these 
three primary modes, the speech sounds can be classified into three distinct classes as 
follows: 

i) Voiced sounds 
il) Unvoiced or voiceless sounds 
iil) Plosive sounds 


For voiced sounds, the source of excitation is at the glottis and the sounds are 
generated by broad-band quasi-periodic puffs of air produced by the vibrating vocal 
cords. Typical examples of these sounds are voiced consonants, nasal consonants, vowels 
and semi-vowels. 


For unvoiced or voiceless sounds, the source is at some point of constriction in 
the vocal tract, anywhere from glottis to the lips. The vocal cords are spread apart (no 
voicing) and the sounds are produced by turbulent quasi-random airflow. Typical 
examples of unvoiced or fricative sounds are non-nasal consonants. 


For plosive sounds, the source is at the point of closure, and the sounds are 
produced by suddenly releasing the air pressure built up behind the total constriction. 
Unvoiced stop consonants are the examples of such sounds. 


3.4. SPEECH PRODUCTION MODELS 


Sound waves are generated by vibration and are propagated in air or other media 
by vibrations of the particles of the media (It is important to note that sound waves cannot 
travel through vacuum because sound needs a medium such as air, any liquid (e.g., 
water), or any solid to travel along). Since a vacuum does not contain any particles, the 
sound will not be able to propagate through or be heard. A set of partial differential 
equations describing the motion of air in the vocal system can be obtained (Fant [38-39], 
Flanagan [40]) but the solution of these equations is very difficult. A detailed acoustic 
theory must consider the effects of many complicated physical processes [76-77, 79, 99] 
associated with speech production, including the following: the position of the tongue, 
changes in the length, shape and contour of the vocal tract, the resonance of the vocal 
cords, the mass and elasticity of the muscles along the vocal tract walls, dental effects, 
the viscosity of the mucus in the vocal tract, the position of the velum, and subsequent 
coupling with the nasal tract, radiation effects at the lips, losses due to heat conduction, 
and temperature gradients [24, 38-40, 82-83, 85-89]. A major concern in the area of 
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speech processing has been, and still remains, that a detailed acoustic theory 
incorporating all the above processes and effects is still not available completely. 


Many models have been proposed to describe the complicated process of speech 
production. None of these models, alone, can account for all of the observed 
characteristics of human speech (as described above); nor is it probably desirable to 
postulate such a model due to its inevitable complex structure. 


However, for convenience, it is desired to have models that are linear as well as 
time-invariant. But speech production mechanism is neither linear nor time-invariant. On 
the contrary speech is a continuous but slowly time-varying, non-stationary, quasi- 
periodic waveform. Also, the fact that the glottis is coupled to the vocal tract results in 
non-linear characteristics [24-31, 87-89, 99]. 


All speech models make two basic assumptions thereby reducing some 
complexity at the cost of accuracy: 


(i) The vocal tract system and the source of excitation are independent such that 
the vocal tract system can be excited by any of the possible sources of 
excitation. This assumption becomes invalid in the case of transient 
sounds like ‘p’ in ‘pot’, voiced fricatives, nasals, and whisper. The validity 
of this assumption is quite good for the majority of the cases of interest. 


(ii) The characteristics of speech are time-invariant over short segments of time 
(approximately 15 to 25 milliseconds). It implies that in order to represent the 
slowly time-varying characteristics of speech indicating a new configuration 
of the vocal tract, the control parameters of the model require to be updated 
only for the new speech segment, and not for each speech sample. 


Based on these two main assumptions, many speech production models have been 
proposed by various research scientists (e.g., Fant [38-39], Flanagan [40], Schafer [99, 
87-89], Atal and Hanauer [11-12], Itakura and Saito [52]) throughout the world to 
describe major characteristics of speech. Although many models have been attempted, 
the following three speech models are noteworthy for this project: 


1. Linear speech production model [38-39, 40] 
2. Digital model of speech production [87-89, 99] 
3. Linear prediction model of speech production [11-12, 52, 74-76] 


These models are introduced in the next three sections. 
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3.4.1. LINEAR SPEECH PRODUCTION MODEL 


This model shown in Fig. 3.4 was developed by Fant [38-39, 99] in the late 50’s 
(1960). Fant covered the assumptions in detail later on elaborated by Flanagan [40]. 
Flanagan presented the results of some carefully conducted experiments on acoustic 
radiation supporting Fant’s justification as well as the mathematical derivation of this 
model. 


tt ft ap fe 


e(t) glottal vocal tract lip s(t) i Ky 
model model radiation Pr) 4 J y Z 
; ye 
AullpoMwalynninn Y 


G(Z) V(Z) L(Z) 


Figure 3.4: Linear speech production model 
(due to Fant [38-39]) 


In this model vocal tract system is simulated as three different low pass filters one 
each for glottal model, vocal tract model and lip and flat spectrum random noise for 
unvoiced sounds. The impulses simulating the initiation of puffs of air for voiced sounds 
are spaced P samples apart where P is the pitch period (the rate of oscillation of vocal 
cords is called the pitch frequency or fundamental frequency Fo for the particular speech 
segment and its reciprocal I/Fo is known as the pitch period P). The random noise 
simulates the pressure buildup waveform and the quasi- random turbulence for unvoiced 
sounds. 


The linear speech production model in Z-transform terminology can be described as: 


S(Z) = E(Z) G(Z) V(Z) L(Z) (3.1) 
where 

S(Z) <> s(kT) = s(t)=xr (3.2) 

E(Z) © e(t)hexr (3.3) 


3.4.2. DIGITAL MODEL OF SPEECH PRODUCTION 
Schafer [87-89, 99] presented the ideas of the previous section in digital form 
(Fig. 3.5) and probably in a little more sophisticated manner than Fant, and Flanagan [38- 


40]. The digital model of speech production suggests that vocal tract system can be 
represented in a single time-varying digital filter excited either by an impulse train 
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generator (for voiced sounds) or by a random number generator (for unvoiced sounds). 
A gain parameter between the excitation sources and the excited system (digital filter) 
allows some flexibility in the output acoustic level and the digital output corresponds to 
the sampled speech waveform. 


pitch period 
digital filter coefficients 
| (vocal tract parameters) 
impulse aes t 








train 
generator 


switch time-varying 
digital speech 
filter sainples 


random 
number 
generator 


gain amplitude 


Figure 3.5: Digital model of speech production 
(due to Schafer [87-89, 99]) 
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3.4.3. LINEAR PREDICTION MODEL OF SPEECH PRODUCTION 


The linear speech production model, the digital model of speech production, and 
the following two conclusions of Fant, Flanagan, Schafer [38-40, 87-89, 99]: 


1) transfer function of the vocal tract has no zeroes for non-nasal speech sounds 
and the vocal tract can be adequately represented by an all-pole filter for these 
sounds; 


ii) zeroes required by the vocal tract transfer function for nasals and unvoiced 
sounds lie within the unit circle in Z-plane and therefore each factor representing 
zeroes in the numerator of the transfer function can be approximated by multiple 
poles in the denominator; 


and other speech researchers lead Atal and Hanauer [11-12] to develop the linear 
prediction model of speech production (Fig. 3.6). This model has the following distinct 
features: 


a) The four control parameters of the model i.e., linear prediction coefficients 
(aj’s), position of the voiced/unvoiced (V/UV) switch, pitch period P of the 
voiced frame, and gain G (r.m.s. value of the speech samples) give complete 
representation of the speech waveform for a particular frame (speech segment 
during which the vocal tract configuration is assumed to be time-invariant is 
generally called frame). 


b) The effects of the glottal flow, the vocal tract and the lip radiation are 
combined in a single all-pole recursive filter. If number of poles (p) is high 
enough, then this simplified all-pole model gives a good representation of 
almost all speech sounds. The additional advantage of the all-pole model is 
that all the control parameters of the model can be evaluated accurately and 
directly from the speech wave in a very straightforward and computationally 
efficient manner. 


c) The all-pole model in the frequency-domain means that the current speech 
sample is approximated (or predicted) as a linear combination of the past p 
speech samples in the time-domain using the linear prediction coefficients as 
the weighting coefficients (This is where the name linear prediction comes 
from). 


d) Speech can be encoded in terms of the four control parameters and can be 
synthesized from the control parameters in the same manner by a linear 


prediction synthesis model (L.P. Synthesizer, Fig. 5.2, Chapter V) proposed 
by Atal and Hanauer [11-12]. 
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fete voiced Ie = 


afi A . 
AA AMAA unvoiced 





time-varying 
linear 
predictor P 





Figure 3.6: Linear prediction model of speech production 
(a) Time-domain representation (b) Frequency-domain representation 
(due to Atal and Hanauer [11-12, 74-76]) 


3.5. SPEECH SIGNAL PROCESSING: A GRAND CHALLENGE 


Speech signal processing has been declared as one of the most interesting 
challenges by many authors in this field. Two representative examples are quoted here: 


(1) According to Yegnanarayana [129, pp. 551], “Communicating with a machine 
in a natural mode such as speech brings out not only several technological challenges, but 


also limitations in our understanding of how people communicate so effortlessly. The key 
is to understand the distinction between speech processing (as is done in human 
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communication) and speech signal processing (as is done in a machine). When people 
listen to speech, they apply their accumulated knowledge of speech in relation to a 
language to capture the massage. In this process, it is interesting to note that the input 
speech is processed selectively using the knowledge sources acquired over a period of 
time such as sound units, acoustic-phonetics, prosody, lexicon, syntax, semantics and 
pragmatics. This processing varies from person to person, and it is difficult for any 
individual to articulate the mechanism he/she is using in processing the input speech. 
This makes it difficult to write a program to perform the task of extracting message in 
speech by a machine. It should be noted that, for a machine, only the speech signal is 
available in the form of a sequence of samples, the rest of the mechanism involving 
identification of knowledge sources and invoking them on the input signal is a scientific 
challenge. Thus speech signal processing is one of the most interesting challenges that 
arouse curiosity among different scientific groups, such as linguists, phoneticians, 
(psycho) acousticians, electrical engineers, computer scientists and application 
engineers.” 


(2) The Office of Science & Technology Policy (Washington, D.C., USA) has 
identified a group of fundamental problems in science and engineering as Grand 
Challenges for high performance computing. Computerized Speech Understanding is one 
of those grand challenges. Etter [36, pp. 8-10] covers at least five of these challenges as 
given below: 


Computerized Speech Understanding 

Human Genome or DNA (DeoxyriboNucleic Acid) Project 
Weather, Climate, and Global Change Prediction 

Vehicle Performance Improvement 

Enhanced Oil and Gas Recovery 


It is with this background and understanding of one of the most interesting challenges 
and one of the grand challenges in mind that we are approaching Speech Signal 
Processing in the Punjabi Language. 


3.6. CONCLUSION 


The Linear Prediction Model of Speech Production has been specifically chosen 
for this project because of its merits and relatively simple characteristics as explained in 
this chapter, as well as explained earlier in the previous chapter (Section 1.7). Using this 
model, we have synthesized the Punjabi speech sentences in Chapter V. 
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CHAPTER IV 
A NEW CORPUS IN THE PUNJABI LANGUAGE 


Digital signal processing [11-12, 15, 24-25, 79, 83, 85-90, 99-102] has been 
successfully applied to many types of signals including telecommunication signals, audio 
signals, image processing, radar signals, sonar signals, signals in geophysics, and speech 
signals. Digital speech processing includes many fields: (a) Speech analysis, e.g., linear 
prediction (LP) analysis [11-12, 52, 74-76] (b) Speech synthesis, e.g., LP synthesis, text- 
to-speech (TTS) synthesis (c) Speech enhancement or noise cancellation (d) Speech 
coding (speech storage and transmission) (e) Speaker separation (f) Speaker identification 
(g) Language identification (h) Automatic speech recognition (ASR) which also includes 
continuous speech recognition (CSR), discrete utterance recognition, and keyword 
spotting (i) Pitch and formant estimation (j) Aids-to-the handicapped, and many more. 
Most of these applications require reliable speech and text corpora as the starting point. 
In this chapter, a new text and speech corpus in the Punjabi language has been developed. 


4.1. INTRODUCTION 


Corpus plays the main role in any speech processing work (e.g., Speech Analysis 
and Synthesis, Language Engineering, Computational Linguistics, Cross-language 
Information Retrieval, Speech Recognition, Speech Translation, Multi-Lingual 
Lexicography, Developing Language-Learning tools). A corpus is a large body of text in 
the natural state as recorded speech or written text. The corpora in the machine-readable 
form are considered more useful and versatile than the corpora in their natural state 
because the machine-readable form of the corpora can be used, processed and 
manipulated in a variety of ways that are not possible with the natural formats. However, 
it is obvious that the machine-readable form of the corpora is derived from the corpora in 
the natural state, but not the other way around. 


In this chapter, a new corpus for the Punjabi language has been designed. 


The Punjabi was the native language of the Punjab state of undivided India. In 
1947, when the British rulers partitioned India into two countries (India and Pakistan), 
the Punjab state was also bifurcated into two states: East Punjab (in India), and West 
Punjab (in Pakistan). Like any other language of the world, there are many dialects of the 
Punjabi language in both countries. Punjabi University (Patiala, India) published a list of 
31 dialects of the Punjabi language (see Section 2.1.). The main dialects of the Punjabi 
language in India are: Malwai, Majhi, Doabi, and Puadhi, and the main dialects of the 


> The results obtained in this chapter have been published in International COCOSDA-2013 / CASLRE-2013 (Conference on 
Asian Spoken Language Research and Evaluation (indexed and published in IEEE Xplore), and International Symposium on 
Frontiers of Research on Speech and Music (FRSM-2012), Jan 2012, pp. 223-227. 
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Punjabi language in Pakistan are: Multani, Pothohari, and Lehndi. It will be an enormous 
task to design a corpus that can completely describe all dialects of the Punjabi language. 
This work, therefore, concentrates only on one dialect of the Punjabi language: the 
Malwai dialect. 


The Malwai dialect has been chosen because of the five major reasons described 
in Chapter 1. 


In the Punjab state of India, the Punjabi language is written in the Gurmukhi 
script, whereas in the Punjab state of Pakistan, the Punjabi language is written in the 
Persian (or Shahmukhi) script. Since the Malwai dialect in the Punjab state of India is 
written in the Gurmukhi script, consequently in the corpus designed in this chapter, the 
speech sentences have been recorded in the Malwai dialect of the Punjabi language, and 
the sentences have been written in the Gurmukhi script. 


4.2. CORPUS DESIGN 


Designing databases and corpora is generally considered the first major step in 
speech processing. 


Corpora are generated by recording optimal set of textual words/sentences spoken 
by the native speakers of a particular language. We have used high quality microphones 
for the recordings of selected words and sentences. We have also maintained a 
pronunciation vocabulary by mapping each word to a sequence of sound units. The 
spoken language not only carries the linguistic information, but also conveys many 
features related to the nonlinguistic information such as the speaker’s emotions, gender, 
age, social status and cultural background. 


The quality of the corpora and databases play a decisive role in determining the 
quality of the speech processing work based or tested on these corpora and databases. 
The speech processing work can be considered as a building constructed on the 
foundation of these corpora and databases. Apparently, a strong building cannot be 
constructed on a weak foundation. Many important attempts have been described in 
literature on the corpora design of Indian Languages. 


Examples quoted in the literature, and published during the last 10 years are 
included in this work to convey the latest trends in the corpus design [2, 9, 51, 106, 120]. 
Out of the literature published on the corpus design, 16 representative papers are listed 
here: 


five articles in [120]: V. N. Shukla (pp. 87-94); G. S. Lehal and Meenu Bhagat 
(pp.128-141); S. Chanda, S. Sinha and U. Pal (pp. 244-247); Om Vikas (pp. 280-300); 
Om Vikas (pp. 305-330); 
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five articles in [2]: S. S. Agrawal, K. Samudravijaya and Krunesh Arora (pp. 21- 
27); Sunita Arora, Krunesh Arora and S. S. Agrawal (pp. 122-126); Sunita Arora, Garima 
Sinha, Rohit Parashar and S. S. Agrawal (pp. 127-131); Deepak Dhiman, Samita Tayal, 
S. S. Agrawal and N. K. Sharma (pp. 303-305); Vijay K. Gugnani, Krunesh K. Arora, V. 
N. Shukla, Sunita Arora and Mohan Gour (pp. 317-322); 


four articles in [51]: Shyam S. Agrawal, Karunesh K. Arora, Sunita Arora and 
Krishan K. Goswami (pp. 20-23); Shyam S. Agrawal, Karunesh K. Arora, Sunita Arora 
and K. Samudravijaya (pp. 94-97); Karunesh K. Arora, Sunita Arora and Shyam S. 
Agrawal (pp. 330-333); Sunita Arora, Karunesh K. Arora and Shyam S. Agrawal (pp. 
176-179); 


One article [9]: Krunesh Arora, Sunita Arora, Somi Ram Singla and S. S. 
Agrawal; 


One article [106]: Paramjit Singh Sidhu. 


4.3. EXAMPLES OF THE CORPORA 


Some examples of the corpora in the Indian languages found in the literature [2, 9, 
51, 106, 120] are included in this section. Two broad categories of corpora have been 
described in literature: (a) Text Corpora, and (b) Speech Corpora / Databases. 


Amongst many other examples, the major examples of the text corpora include: 
Kolhapur Corpus of Indian English (KCIE), Technology Development for Indian 
Languages (TDIL) corpora project (Govt. of India), Enabling Minority Language 
Engineering (EMILLE) corpus, Central Institute of Indian Languages (CIIL) corpus, 
GyanNidhi corpus of C-DAC, Department of Information Technology (DIT) consortia 
projects, and the Hindi Samgraha project of Mahatma Gandhi International Hindi 
University (MGIHU). 


The major examples of the speech corpora / databases include the corpora 
developed at C-DAC (Centre for Development of Advanced Computing), C-DAC and 
Central Scientific Instruments Organization (CSIO), Indian Institute of Technology (IIT), 
Indian Institute of Science (IISc), International Institute of Information Technology (IIIT) 
and HP Labs, Central Electronics Engineering Research Institute (CEERI) and Tata 
Institute of Fundamental Research (TIFR), Central Forensic Science Lab (CFSL) and 
Centre for Artificial Intelligence and Robotics (CAIR), Kamrah International Institute of 
Information Technology (KIIT), Thapar University, and Punjabi University. 


These representative examples discussed below have influenced many new 
corpora such as the one designed in this work. 
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4.3.1. Text Corpora 
Several major examples of the text corpora are briefly discussed here. 


1. Kolhapur Corpus of Indian English (KCIE) 


This corpus is unanimously considered the first corpus in an Indian language. It 
was developed in 1988 at Shivaji University Kolhapur, India. It consists of One million 
words of Indian English. 


2. TDIL corpora project (DIT) 


The Govt. of India Department of Information Technology (DIT) has originated a 
large corpora design (more than one million machine-readable words for all major Indian 
languages) under its Technology Development for Indian Languages (TDIL) program in 
1991. 


3. Enabling Minority Language Engineering (EMILLE) corpus 

This corpus has been developed as a result of a joint project between two 
countries, India and UK: CHL Mysore (India) and the Lancaster University (UK). The 
complete details of this corpus can be found on the internet [151]. 


4. Central Institute of Indian Languages (CIIL) Corpus 

CHIL, in addition to collaborating for developing the famous three component 
(monolingual, parallel, annotated) corpora project EMILLE [151] for the Indian 
Languages, has designed and collected linguistic material on dialects, dictionaries, 
grammars, phonetic readers, for approximately 120 languages, speech varieties, major 
languages, tribal languages, and relatively less known languages. 


5. GyanNidhi Parallel Corpus 

This corpus has been developed by the Centre for Development of Advanced 
Computing (C-DAC), Noida, India. It consists of more than one million pages of text in a 
total of twelve languages. The Punjabi language is one of these twelve languages. The 
other eleven languages in alphabetical order are Assamese, Bengali, English, Gujrati, 
Hindi, Kannada, Malayalam, Marathi, Oriya, Tamil, and Telugu. 


6. Consortia Projects of Dept of IT (MoCIT DIT consortia projects) 

Ministry of Communications and Information Technology (MoCIT), Govt. of 
India through its Department for Information Technology (DIT), has been organizing 
monolingual and parallel text corpora under various consortium projects. 


7. The Hindi Samgraha project 
This is a mega project originated by the Mahatma Gandhi International Hindi 
University (MGIHU). 
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4.3.2. Speech Corpora / Databases 


Several major examples of the speech corpora /databases are briefly discussed 
here. 


1. Centre for Development of Advanced Computing (C-DAC) 


C-DAC is a dynamic organization sponsored by the Govt. of India. Operating 
from its headquarters in Pune, a total of ten centers of C-DAC located at (in alphabetical 
order) Bangalore, Chennai, Hyderabad, Kolkata, Mohali, New Delhi, Noida, Pune and 
Trivandrum have been actively engaged in speech and text corpora/database 
development. 


C-DAC Noida has been working on the development of speech corpora in 
several languages including Hindi, Punjabi (a joint project in collaboration with CSIO) 
and Marathi languages. Using a statistical analyzer tool known as Vishleshika, C-DAC 
Noida has collected different categories of thousands of the most frequent words sets, 
phonetically rich sentence sets and prosodically representative sentence sets for natural 
language and speech processing projects in speech synthesis systems. 

C-DAC Noida has collaborated with many other organizations, agencies, and 
institutions actively engaged in speech corpora development and related activities. CSIO, 
ELDA and DRDO can be quoted as three prime examples. C-DAC Noida has 
collaborated with Central Scientific Instruments Organisation (CSIO) Chandigarh for a 
versatile speech corpora development in the Punjabi language. It has collaborated with a 
France-based organization Evaluation and Language resources Distribution Agency 
(ELDA) to generate a 2000-speaker database in Hindi language recorded in different 
environments (e.g., homes, streets, public places, offices and moving vehicles), different 
age groups, different genders, different regions and different dialects. It has collaborated 
with SAG of DRDO (the Scientific Analysis Group of the Defence Research and 
Development Organization) to develop similar speech and text corpora for Hindi, Bengali 
and Manipuri languages. 

Some examples of the speech corpora development of the other centers are as 
follows: C-DAC Kolkata has been developing speech corpora (similar to the ones 
described in the previous paragraph) in Bengali, Assamese, and Manipuri languages, C- 
DAC Trivendrum in Malayalam, Tamil, and Telugu languages, and C-DAC Pune in 
Urdu, Sindhi, and Kashmiri languages. However, out of all the C-DAC centres, C-DAC 
Noida is the only one working in the Punjabi language at present. 


2. Indian Institutes of Technology (IIT) at Chennai, Delhi, Guwahati, Kanpur, 
Kharagpur, Mumbai 

Almost all major Indian Institutes of Technology throughout India have been 
active in the speech corpora/database development in various Indian languages at one 
time or the other. Some examples are: Chennai (Hindi, Tamil, Telugu), Delhi (Hindi, 
Punjabi), Guwahati (Assamese, Manipuri), Kanpur (Hindi, Nepali), Kharagpur (Hindi, 
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Marathi, Urdu) and Mumbai (Marathi, Konkani). IIT Mumbai is involved in designing 
the concatenation synthesis Text-to-Speech (TTS) system called Vani. 


3. Indian Institute of Science (IISc), Bangalore 

IISc, in addition to other speech corpora/database related activities, has been 
active in the development of a multichannel isolated word database for Indian English to 
be used in the telephone systems for speech recognition in Indian English. 


4. Central Electronics Engineering Research Institute (CEERI) Pilani and 


Tata Institute of Fundamental Research (TIFR), Mumbai 
These two organizations have been developing the speech corpora/databases to be 
used in the testing and design of speech recognition and synthesis systems in several 
languages including Hindi, Marathi, and English. One prime example of their 
collaborative work is a speech corpus development to be used in the voice-operated 
Railway Reservation System. 


5. Thapar University Patiala 

Thapar University led a 7-8 member team of speech researchers between 2000 
and 2004. The team worked on an 88.8 lakhs project known as the Resource Centre for 
Indian Language Technology Solutions for the Ministry of Communications and 
Information Technology (MoCIT) Govt. of India. The team developed LIKHARI (a 
word processing software tool for the Punjabi language) that later on led to a similar 
package AKHAR at the Punjabi University Patiala. It is interesting to note that 
LIKHARI means writer and AKHAR means word in the Punjabi language. 


6. Punjabi University Patiala 

The Advanced Centre for Technical Development of Punjabi Language, 
Literature & Culture at the Punjabi University Patiala has been active in a variety of 
projects as implied by the name of the centre. The corpus development for the Punjabi 
language has been an on-going activity with the major objective of Text-to-Speech (TTS) 
Synthesis. 


7. Indian Institute of Information Technology (IIIT) Hyderabad 

INT Hyderabad mainly concentrating on the speech corpora / databases 
developmental projects in Hindi and Telugu languages has also been dealing with other 
Indian languages. 


8. Central Forensic Science Lab (CFSL) Chandigarh and 
Centre for Artificial Intelligence & Robotics (CAIR) Bangalore 
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CFSL has designed a text independent Speaker Identification database (SPID) for 
forensic applications for English, Hindi and Punjabi languages. Two more examples of 
the CFSL corpora development work include: a corpora recording in Hindi, Indian 
English and Punjabi languages with native speakers of Punjabi, and another corpora in 
Hindi, Indian English and Kannada languages with native speakers of Kannada. The two 
organizations (CFSL and CAIR) have joined hands to expand this speech 
corpora/database development work to 10 Indian languages with 10 different channels, 
distortion, and several disguises to address the specific needs of the forensic applications. 


9. Kamrah International Institute of Technology (KIIT) Gurgaon 


This institution has been developing a multilingual corpus consisting of more than 
10000 sentences and words in Punjabi, Hindi, Nepali and Indian English languages. 


In addition to the above well-known projects and institutions, other organizations 
known to be developing task-oriented special-purpose databases in India or several other 
corpora development projects mentioned in the literature include: 


1. HP Labs (in collaboration with INIT Hyderabad) working for the Automatic Speech 
Recognition (ASR) and Text-to-Speech (TTS) applications in Hindi, Assamese and 
Indian English 


2. Utkal University Bhubneswar (Oriya) 
3. Aligarh Muslim University (AMU), Aligarh (Hindi, Urdu, Arabic) 


4. Bharati Vidyapeeth, Pune (Speech synthesis word concatenation system database for 
Marathi) 


5. Prologics, Lucknow [51, pp. 94-97] 
6. Bhrigus Software (I) Pvt. Ltd. Hyderabad [51, pp. 94-97] 
7. Interactive Communications Systems (ICS), Hyderabad [51, pp. 94-97] 


The institutions actively involved in the text and speech corpora / databases 
developmental work for the Punjabi language (present and past) in India include Punjabi 
University Patiala, Thapar University Patiala, C-DAC Noida & CSIO, KIT, CFSL & 
CAIR, IIT Delhi, and CIIL Mysore. The representative examples described in this section 
have been the continuous source of inspiration for the development of many new corpora 
such as the one designed in this work. 


At present, the major challenges for the research in speech processing in the 
Punjabi language include the corpora development, and the efficient coding of these 
corpora. This thesis addresses these two issues in Chapter II and Chapter IV. The aim of 
this chapter is to develop a new and representative text and speech corpus in the Malwai 
dialect of the Punjabi language. 
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4.4. OVERVIEW OF THE NEW CORPUS 


The new corpus designed in this work is discussed in the following sections. This 
corpus consists of approximately 300 items (sentences, and single line folk songs known 
as bolis). The corpus is divided into two parts: Part A and Part B. Part A consists of 
about 200 items (mainly sentences and about 10 bolis), whereas Part B consists of 
slightly more than 100 bolis. This section presents an overview of both parts of the 
corpus. The first part (Part A) of the corpus consists of 52 sets of the Punjabi speech 
sentences: one set for each of the 35 letters in the Gurmukhi script, one set for each of the 
6 letters with a dot, known as bidi (.) placed below a letter symbol, and one set each for 
three conjunct consonants [h], [r], and [v]. The 45" set includes sentences with short and 
long forms of some words leading to different meanings due to this variation in the length 
of the pronunciation. The 46'" set includes sentences that describe the tonal nature of the 


Punjabi language using the phoneme h (J). The next (47") set demonstrates the contrast 


between the five pairs of ten Gurmukhi vowels: , 1; fe, a: g, @: &, NE wy, ©. The last 


= 


five (4g" to 524) sets concentrate on the five tonemes (4, ¥, @, U, 3) to illustrate the 
tonal nature of the Punjabi language. These five sets illustrate the contrasts between five 
tonemes and the five letters immediately preceding these tonemes as follows: 4; 7/8; 


3/<; 2/U; S/S. The second part (Part B) of the corpus consists of 35 sets of bolis (each 


set includes several items). Each item in this corpus concentrates on something special 
about the Punjabi language, the Gurmukhi script, or the Malwai dialect. 


Since this corpus will be mainly used for the linear prediction analysis and 
synthesis of the Punjabi speech sentences, so the complete sentences (rather than words 
only) have been designed in this corpus. All categories and phonetic characteristics of the 
speech sounds (e.g., voiced, unvoiced, nasals, non-nasals, aspirated, unaspirated) spoken 
by the male and female speakers from different villages and cities of several districts of 
the Malwa region of the Punjab state of India have been tape-recorded. In particular, 
sentences demonstrating tones, pairs of words with almost similar sounds, conjunct 
consonants, reduplication, and extended pronunciation have been included. Some lines of 
the popular folk songs have been used in their original form, whereas some lines have 
been used in the modified form (see section 4.7.8). Similarly, some single line folk songs 
(bolis) have been used unaltered. Some new bolis have been written since the author [26- 
30] is a creative writer. The new bolis, written by the author, end with the symbol double 


dandi (\l) to emphasize the originality of these bolis. Some words that are slowly fading 


out from everyday use as well as some theth pendu shabads (rustic words) and 
expressions have been recorded in the corpus. 


In addition to Malwain (a female of the Malwa region) and Punjaaban (a female 
of the Punjab state), the names of several villages and cities of the Malwa region 
prominently appear in the bolis to highlight the fact that this work concentrates on the 
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Malwa region of the Punjab state in India. Some cities outside of the Malwa region and 
the Punjab state have also been mentioned in some bolis. Many popular names/nick- 
names for Malwai males and females, the names of several freedom fighters, trees/crops, 
birds, insects, and animals, and jewelry items of the Malwa region have been prominently 
mentioned throughout the corpus. 


The examples included in this chapter are sentences and bolis in the Punjabi 
language (written in the Gurmukhi script), followed by coding in International Phonetic 
Alphabet (IPA). In some cases, the examples have also been transcribed by using the 
newly designed ARPAbet-compatible phonetic coding scheme called PUNJARPAbet 
(designed in Chapter II). The complete corpus is given in Appendix A and Appendix B, 
where each sentence is transcribed in three different ways: the Punjabi language (Raavi), 
IPA, and PUNJARPAbet. Various categories of users (readers knowledgeable in the 
Punjabi language in Gurmukhi script, linguists knowledgeable in IPA, and the scientists 
knowledgeable in PUNJARPAbet) can equally understand and benefit from the corpus. 


4.5. RECORDING OF THE NEW CORPUS 


Speech sentences spoken by different Punjabi speakers from different villages and 
cities of different districts of the Malwa region of the Punjab state of India have been 
recorded using a cassette tape recorder as follows: 


CASSETE I: 


SIDE A 

Speaker 1: Ranjit Singh, Male, 33 years, Longowal (Sangrur), 23 July 2007 
Speaker 2: Gian Singh, Male, 48 years, Bahadurpur (Sangrur), 23 July 2007 
Speaker 3: Daljit, Male, 58 years, Bathinda, 24 July 2007 

Speaker 4: Seereen, Female, 25 years, Bathinda, 24 July 2007 

Speaker 5: Surinderpal Kaur, Female, 54 years, Bathinda, 24 July 2007 


SIDE B 

Speaker 6: Swaranjit Kaur, Female, 26 years, Bhang Jadee (Mukatsar), 24 July 2007 
Speaker 7: Gurpreet Kaur, Female, 26 years, Lambee Dhab (Mukatsar), 24 July 2007 
Speaker 8: Gurdev Singh, 24 years, Ferozepur, 25 July 2007 

Speaker 9: Surinder Kaur Mann, Female, 53 years, Faridkot, 25 July 2007 

Speaker 10: Swaranjit Kaur, Female, 58 years, Mansa, 25 July 2007 


CASSETE II: 


SIDE A 
Speaker 11: Talwinder Singh, Male, 39 years, Ferozepur, 25 July 2007 
Speaker 12: Tek Singh, Male, 70 years, Mala Kalan (Moga), 25 July 2007 
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Speaker 13: Davinder Kaur, Female, 48 years, Samana (Patiala), 26 July 2007 


SIDE B 
Speaker 14: Bhim Raj Bansal, Male, 65 years, Barnala, 9 August 2012 
Speaker 15: Surinderpal Singh, Male, 62 years, Ludhiana, 10 August 2012 


Age Group: 
Males: 24, 33, 39, 48, 58, 62, 65, 70 
Females: 25, 26, 26, 48, 53, 54, 58 


Fifteen speakers were chosen at random from the various segments of the society 
(students, workers, professionals, villagers, and city residents) so that the corpus is as 
representative as practical. These speakers were from ten different districts of the Malwa 
region of the Punjab state (in alphabetical order) as follows: Barnala, Bathinda, Faridkot, 
Ferozepur, Ludhiana, Mansa, Moga, Mukatsar, Patiala, and Sangrur. 


4.6. TECHNICAL CONSIDERATIONS 


Technical considerations for recording the speech data include sampling 
frequency and resolution. 


4.6.1. Sampling Frequency 


If the highest frequency present in an analog signal is represented by f,, and the 
sampling frequency is represented by f;, then according to Shannon’s Sampling Theorem, 
any digital sequence generated from the analog signal using the sampling frequency f; 
that satisfies the inequality (f; =>2 f,) is sufficient to represent the original analog signal. 
If this inequality is not satisfied, then a phenomenon commonly known as aliasing occurs 
resulting in distortion when the analog signal is reconstructed from the corresponding 
digital (sampled) sequence. Other way of stating this is that aliasing is a problem that is 
caused by undersampled systems (sampling too slow), where a frequency higher than f, 
assumes the identity (or alias) of a lower frequency. 


A related term is the Nyquist frequency (named after Harry Nyquist). It is half the 
sampling frequency (f;) and represents the upper bound on the frequencies that should be 
contained in the digital signal. The Nyquist frequency is also known as the folding 
frequency of a sampling system. If f; = 2 fn, i.e. the sampling frequency is exactly equal 
to twice the highest frequency, then the sampling is said to be at the Nyquist rate. 


It is better to choose the sampling rate f, to be slightly greater than 2 f,, to account 
for the practical hardware limitations. In case f; > 2 f,, then the analog signal is said to be 
oversampled. 
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Four natural sampling frequencies (6.4 kHz, 8 kHz, 10 kHz, 16 kHz) have been 
used in practice [Rabiner and Schafer, 89, pp. 18-19] as summarized in Table 4.1. 


Table 4.1: Speech Bandwidth 




















Speech Bandwidth J, (kHz) J, (kHz) 
Telephone: 3.2 6.4 
Extended Telephone 4 8 
Oversampled Telephone 5 10 
Wideband (hi-fi) 8 16 














4.6.2. Resolution 


In addition to the sampling frequency, the second parameter is the resolution or 
word size used for recording data. Two values (8-bit or 16-bit) are in general use in 
commercially available systems for the resolution. The 8-bit resolution ranges from - 2 
to+ (2’ — 1), whereas the 16-bit ranges from - 2 to+ (Cg — 1). For an efficient VoCoder 
consideration from the viewpoint of low bit rate transmission (see Section 5.5.3), it is 
customary to use 8-bit resolution at the cost of slightly degraded quality of the speech. 


These two parameters (sampling frequency and resolution) must be known in 
order to play the recorded sound (speech or music) data files. In this project, the values of 
Jf, =8 kHz, and 8-bit resolution have been used. 


4.7. ORIGINAL FEATURES OF THE NEW CORPUS 


This corpus consists of at least twenty original features. The features of this 
corpus have been described (exemplified with words in bold, wherever appropriate) in 
this section as follows: 


4.7.1. Uniqueness 


Each item of the corpus is unique. Each sentence/boli in the corpus has been carefully 
designed so as to convey something new and special about the Punjabi language [10, 13- 
14, 21, 95-97, 111, 116, 128], Malwai dialect [140], Gurmukhi script [78, 107, 109-111, 
118], Phonetics, vowels, consonants, and vowels signs (aka matras or accessory signs or 
diacritical marks) by using a wide variety of vocabulary, idioms and expressions. In 
particular, a special feature called Anupras Alankar (HQ@Y"H nad) has been used 
throughout the development of the corpus (Anupras means “alliteration”). According to 
Wikipedia [155]: “In language, alliteration is the repetition of a particular sound in the 
prominent lifts (or stressed syllables) of a series of words or phrases.” Consequently, the 
Anupras Alankar means the repeated usage of a word or a letter to enhance the artistic 
beauty of a piece of literature. In Indian languages and literature, the Anupras Alankar 
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has been used for centuries. The idea behind the usage of the Anupras Alankar in this 
work is that the repeated use of a letter will enable the user to thoroughly analyze and 
synthesize the corresponding phoneme in a single item. Two examples from the new 


corpus (predominantly making use of the letters A and Z respectively) to illustrate 
Anupras Alankar using the PUNJARPAbet are given below: 

AYE ASS HS, ESE" St SSE 

sUgor sUdol sUnokkhi, tolna nohi labboni. 

/S UH hG AH RH -S UH D: AOL-S UHN AH KKHIY, T:1 OW N: AA - 

N AH H JHn -L AH hBBN: IY// 


ust Dzait Had OS SA a, AA aH YS Sot 

pori corgi moroak nal lor ke, kar kar buha ponngi. 

/P AO RHTY - CH AH hRH GITY - M AH RH AHK -N AAL-— 

LAH RH - KEY, K AA RH - K AARH- B UWH AA - PI AHn NN GTY./ 


4.7.2. Speech Sounds 


All categories and phonetic characteristics of the speech sounds (e.g., voiced, unvoiced or 
voiceless, nasals, non-nasals, aspirated, unaspirated) have been recorded. All places of 
articulation (bilabials, labiodentals, dentals, alveolars, retroflexes, palatals, velars, 
uvulars, pharyngeals, and glottals) and manners of articulation (plosives, affricates, 
nasals, fricatives, laterals. rolled, flapped, and semi-vowels) have been represented 
throughout the corpus [131]. 


4.7.3. Selection of the words and sentences 

The words and sentences used in the corpus have been selected from books written by the 
Sahitya-Academy Award winner writers [147] of the Malwa region (The Sahitya- 
Academy Award is the highest literary award administered by the Government of India). 


The words and sentences have been selected from several books where the word Malwa 
is included in the title of the book [5, 64, 98, 103-105, 112-114, 140]. In particular, the 
bolis have been selected from the books by Sidhu [103], and N. Singh [112-114]. 


4.7.4. Tones (4, 3, &, A, J, J) 


Punjabi is a tonal language [14,16-19,48,96,129]. Tonal languages make use of the 
relative pitch variations to signal lexical differences [129, pp. 596-597]. About a century 
ago (in 1914), T. Grahame Bailey [14, pp. xv] stated: “Variations in the tone of the voice 
form a very remarkable feature of Panjabi pronunciation. There are two special tones, 
apart from the ordinary tone of speaking. They occur in stressed syllables only.” In the 
later paragraphs on the same page, Bailey [14] calls the two special tones mentioned 
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above as (a) low rising tone or rising-falling tone (b) high falling tone. The voiced 
aspirates (4, 8, @, 4, 3) in the initial position of words and stressed syllables represent 


the unaspirated voiceless (&, J, <, 3, UY) followed by a low tone. For the non-initial 
(middle/medial or final) positions of the voiced aspirates (Y, 8, @, 4, 3), there is a high 
tone on the vowel before the voiced unaspirated (JI, 4, 3, T, 4), or a low tone on the 


vowel following the voiced unaspirated (JI, 4, 3, <, 4). Five sets (set numbers 48-52) 


in the corpus have been designed to illustrate these tonemes. We present here two 
example sentences from the corpus whereas more examples will follow later: 


1. Shure Ped wf SF wSser et fare | 


boglar mége kUmlar da kora kha gla. 
/B AH GITH AA RH- MEY hGEY - KI UH MIH AAR-D AA - 
Kl OW RH A - KH AA - GIH AA/ 


a: Ut & ug Ug, Ge Gant ct <a 
padi da pdd_pdddora, tUdd tarama di vddd. 
/P AAn hD TY - D AA - P AHn hD - P AH hDD AHR AA, TI UWn DD - 
Tl AH R AH M AAn- DIY - V AH hDD/ 


The consonantal value of letter J represents the phoneme /h/ when initial. The non-initial 


J or the conjunct consonant J, in several words, is replaced with the appropriate high or 


low tone. Two examples from the corpus (set 46) are: 


1 o.uretute vot ut! 
pani pi ke cakki pi. 


/P AAN: IY - PIY - KEY -CH AHKK IY - PIYh/ 
2 « eaTewadtd fon, fest es rig egal 


vor han da kUri ni ml je, tIbbi Utte mi var je. 
V AHR-HAAN:-DAA-K UHRHIY -N UWn- MIJHL-JEY, 
/T TH BBIY - UH TT EY - M IHnh - V AHhR - J EY/ 


4.7.5. Reduplication (use of adak) 
In the corpus, several sentences have been designed to demonstrate that the meaning of a 
word can change with the addition of the sign called (adak) as in US" (means address) 
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and U3 (means /eaf). Gill and Gleason [43] state: “Gemination is written by the sign 


/addak/ above and before the consonant to be doubled.” According to Bahri [13]: “Long 
(or double) consonants have an overhead crescent sign, called adhak, before them.” The 
following pairs of words are good examples of the effects of the reduplication: 


4.7.6. Extended Pronunciation (BHAT GUE) 


Examine the pair of words italicized and in bold in the following sentences from the 
corpus: 
lL Sa At ga Fo BN? 


pora ji! phuk paras li? 
/P!1 AH R AA - J TY! PH UW K - Pl AHR AA AH - LIY?/ 


> WAIT CAT J Tae! oO Yast AT" II 


khorka dorka ho ju kurie! na khorkao ni kUdda. 
/KH AH RH K AA -D AH RH K AA-HOW -J UW-K UHRHIY EY! 
N AA - KH AH RH K AA AH - NITY - K UHn D:D: AA./ 


Several sentences in the corpus (set 45) have been designed to demonstrate that in these 
pairs of words, the meaning of the word changes with the addition of // at the end of the 


word, and its pronunciation is also extended. That is why the new term Extended 
Pronunciation (SHA€e* Gage) has been coined by the author for this feature of the 


Punjabi language. 
4.7.7. Popular Folk Songs 


Some lines of the popular folk songs have been used in the original form. One example is 
from a Punjabi movie song: 


we uret fim 2 ot, Hoste war sddie od 


kUtt pani pla de ni, sonie kara pérédie nare 
/K1 UH T:T: - P AAN: TY - PIH AA - DEY - NIY, SOWHN: EYn - 
KI] AH RH AA- Pl AH EYn DIY EY- N AAR EY/ 


4.7.8. Modified form of Folk Songs 


Some lines of the popular folk songs have been used in the modified form. One example 
from the corpus is: 
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wgMr SISt "3, Wait ad yf Uf 

kUggua tali ’te, kUggi kore ku ku. 

/K1 UH GG UW AA - T: AAHLTY - ’T EY, KI UH GGIY - K AHR EY - 
Kl UWn - KI UWn/ 


In the above example, the word ASH in the original boli was changed to wgMr, because 


new boli of the corpus was designed to study the tonal letter ?-4/. 


4.7.9. Bolis (Single line Folk Songs) 


Some single line folk songs (bolis) have been used. Two examples from the corpus (Part 
B) are given below: 


1. 


do SS [SS VAS, AS AAS Uae IT 
cann pavé nItt cdrda, sant sajjana de baj hanera 


ac feed aa euler, drdt 2 te fect 
chota dlor bara tUtt-pena, hassadi de ddd gInda 


4.7.10. New Bolis 


More than 40 new bolis have been written (The author [26-30] is a creative writer, and 
has published five books of original Punjabi poetry). The new bolis, written by the 


author, end with the symbol double dandi (||) to emphasize the originality of these bolis. 


Five examples are given below. These new bolis also illustrate the occurrence of each of 
the five tonemes (4, 8, @, U, 3) in the initial, middle (medial), as well as the final 


positions (words in bold below) in a single boli: 


1. 


us ay Uby gout, os Awe aso agell 
kUd kddd pig cutdi, val sdgne kolola korde. 
Ws one t, ase Fate cell 

baj nasiba de, bUjan cana de dive. 

fs Geer HS, ust ut Ser 

tll kdddoda gere, pani pi thdda. 

de ead 4s ad 3d, aa feu de 3 TAI 
t3de vodge tan kUre tere, kSd4 vicc ked ho gai. 
fara sus, As Bet SoA 

plijgi pagpdari, sab loi pdrjaie. 
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4.7.11. Rustic Words and Expressions 
Some theth pendu shabads (rustic words) and expressions that are slowly fading out from 


everyday use have been included in the corpus. Fading letters (e.g., 2, &) have also been 


included. Some examples of the fading words and expressions are: 


1. vguddsvayugven dvgs ve? 
cetu core ne cakcudor calao ke carta cdd? 
2 UNldddldtEes, vas UT Sd! 
cora! cé cé ci ci chadd, cukné ’c car thokt! 
3. «Vd Sta WIE, G3 age fact ASS! 
canne nti cdnna samna, utte kanon siddi satir. 
4. ga-2ad ectadt wis was Shpr| 


tUk-tek ne totiri anti ossman thdSmmla. 


4.7.12. Variety of Speakers (sex and age) 


Speech sentences spoken by fifteen adult male and female speakers (age range: 24 — 70 
years) have been recorded for the speech corpus. 


4.7.13. Variety of Speakers (location and districts) 


Speech sentences spoken by fifteen Punjabi speakers from different villages and cities of 
ten districts of the Malwa region of the Punjab state of India have been recorded for this 
speech corpus. These districts (in alphabetical order) include: Barnala, Bathinda, 
Faridkot, Ferozepur, Ludhiana, Mansa, Moga, Mukatsar, Patiala, Sangrur. 


4.7.14. Names of Places 


In addition to Malwain (a female of the Malwa region) and Punjaaban (a female of the 
Punjab state), the names of twenty-four villages and cities of the Malwa region (in 
alphabetical order: Barnala, Dakha, Dangon, Daudhar, Duggri, Faridkot, Jagraon, Jaito, 
Jangpur, Jarg, Kaunke, Kotkapura, Ludhiana, Maurh, Moga, Mohi, Mukatsar, Nabha, 
Pandori, Patiala, Raikot, Sanghera, Sarabha, Sunam) prominently appear in some of the 
bolis and sentences of the corpus to highlight the fact that this work concentrates on the 
Malwa region of the Punjab state in India. Four examples are: 


1 vdaneese, fir so firr 
hari na molvene, gidda har gla 
2. Ha ue Adad, wey UATSE cI 


tamoak pove sorkare, onokh pajabon di. 
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3.0 AST fw ard Aadtidase 
jeto da kIla tapa dt, je koddi ma di gal ve 

4. Se fls ano 2: At, AteryS, ea 
tin pld kajra de: mohi, jagpUr, dakha 


The above examples include references to four places in the Punjab state (Jaito, Mohi, 
Jangpur, Dakha) in addition to the prestigious words Malwain and Punjaaban. 


4.7.15. Slang Names of Places 


Some bolis mention the name of the city of Ludhiana as Ludhehana, since the Malwai 
people pronounce Ludhiana as Ludhehana. Similarly, places named Barnala, Faridkot, 
Jagraon, Moga, Mukatsar and Raikot have also been used in the corpus in their slang 
form as follows: Barnale = to/in/from Barnala, Faridkotia = A person from Faridkot, 
Faridkoto(n) = from Faridkot, Jagrava(n) = Jagraon, Mogio(n) = from Moga, Muksar = 
Mukatsar, Raikotee = A person from Raikot. 


One boli mentions the name of Amritsar as Ambarsar, since the Malwai people 
pronounce it as Ambarsar (the city of Amritsar is in the Majha region and not in the 
Malwa region of the Punjab state). Sanghol (of the Puadh region) has also been 
mentioned in the corpus. Cities outside of the Punjab state (Delhi-India, and London-UK) 
have also been mentioned in the corpus. Five illustrating examples are given below from 
the corpus: 


1. YAR HA Woe } Hs uqSs I"? 
mUksor moja manon nti mdter parne a? 
kdta kar sdgo] vala khu katt diige 

3. HAdctadcaydd, 3 yHe weno 
jetti kotkapure di, te bamon Sborsor da 

4. Bereas cad, da fest a act 
lUddehane ko! dUggoari, ris dIlli di kardi 

5. Gandiddl, du dae feo d oti 


dom vire di, tUmm lddon vice pe gi. 


4.7.16. Popular Names and Nick-names 


Many popular names and nick-names for the Malwai males and females are included in 
various parts of the corpus. Some examples of the male names are as follows: Amar, 
Ishar, Inder, Shera, Sunder, Seva, Sadhu, Himat, Hukma, Kewal, Kaila, Kharhak Singh, 
Genda, Jagira, Ginder, Megha, Chetoo, Chhinda, Jotee Jindal, Juala, Jhanda, Tehala, 
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Thakar, Dogar, Dheroo, Kundha, Ganda Singh, Thamman, Dulloo, Dheedo, Pooran, 
Phakeereea, Phumman, Phaggoo, Bhola, Maghar, Suchcha, Khushia, Jindua, Ghudda, 
Jeonha Maurh, Bachana, Modan, Munshee. One sentence is given here as an example 


where the predominant phoneme is [@] = [ph], and the male name gat (phokiria) has 
been used in its slang form as eatsHte (phokirie): 


ead? & dex edited urfimr FT| 
phokirie da phUphor phoridkot6 ala si. 


Some examples of the female names included in the corpus are: Bachno, Bhago, 
Bhindo, Bhag Bharee, Bindo, Billo, Dhan Kur, Gelo, Jamalo, Jeeto, Jindro, Mindo, 


Nimmo, Ranhi, Ranho, Tarsemo. One example boli involving the female name nw ad (je 


kUre) is given here: 


Tat Hoge Fad ss, Fauta wet Aas 
haka marde bokkoria vale, dUdd pi ke jai je kUre 


4.7.17. Freedom Fighters (Martyrs) 


Young freedom fighters Shaheed Kartar Singh Sarabha, Shaheed Bhagat Singh, and 
Shaheed Udham Singh Sunam (Shaheed means Martyr), who sacrificed their lives during 
the independence struggle before 1947 are very popular amongst the youth of the Malwa 
region. They were hanged by the then British Rulers of India in November 1915, March 
1931, and July 1940 respectively. It is just a coincidence that Kartar Singh Sarabha and 
Udham Singh Sunam were born in the Ludhiana and the Sangrur Districts of the Malwa 
region respectively. All three of them have been respectfully mentioned in several 
sentences and bolis in this corpus. Two examples are given below: 


1. HAeugdhAss ass d, 8 Asses AAT 
jan corgi sorabe kortar di, che sarbale sajge. 
2. QOH, 33, AS, SH IA veel 


udom, pagat, sorabe, phasi hass carde. 


4.7.18. Trees/weeds/crops 
The following trees, weeds, and crops are either grown by seeding or naturally grown as 


weeds in the Malwa region: {SH (Margosa Tree), Sot (Jujube Tree), SJ (a kind of 
oblong fruit, a kind of cucumber). 3d¥YrI (water melon), MY (Potato), Sa (Clove), Ad 


(Rapeseed plant or crop, Black mustard), ATdT (a leafy vegetable: often mustard leaves), 
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ear ST YS (farm of grams), SA/ASSA (Colocynth), SSH" (American Cotton), faad 


(Acacia), 33 (Mulberry). Two examples of the Bolis from the corpus including the 


names of four of the trees, weeds, and crops (last four items in the above list) are as 
follows: 


1. Sfmt des dea, de dine as feu saw 


tUmmla di vel vadd ke, jatt bijde khet vicc norma 


2. Ys sat A faas we wy, fers a 8 ferr gs ot fet 
mUda rohi di kIkkor da jatu , vla ke le gla tut di chlti 


4.7.19. Birds, Insects and Animals: 


Many birds, insects, and animals found in the Malwa region are frequently mentioned in 
everyday conversations and communications of the Malwai population. Some of these 


are: wait (Dove), fs3d (Partridge), ABlaES Kol (Indian Cuckoo, Nightingale), as 
(Tortoise, Turtle), aler-AY (A species of snake), wat (Cow/Ox), ast (Dog), aot 
(Donkey, Ass), Sd (Monkey), ast (Camel), ce (Pony, Hack), Het (Old female 
animal), AAS (Buffalo), Set (Adult young she Buffalo), Yd Ser (Adult young he 
Buffalo, Brown Color), @fdazar (Young Bullock), ectact (sandpiper, plover, peewit, 
pewit), wat (Mare), fey (Kite, Bird of Prey), 3s (Wasp, Yellow jacket), 3g (Sheep). 
Three examples of the bolis from the corpus including the names of the birds, insects and 
animals (last three in the above list) are as follows: 
1 why cede Ur, fag fees cree nist 

okkh potvaron di, jI6 [ll de alone ada 
2. ez astdst-dai, faa wear gafont Ao 

cher ke parlda-ragia, kItthe dato bubsult sada 
3. «sa Retoe th Sot rect 33 

tor sUkinon di, ta ki jandi péde 
4.7.20. Jewelry Items 


Males and females of the Malwa region are fond of wearing jewelry items generally 
made of silver and gold. Many jewelry items have been prominently mentioned in many 


sentences and bolis of the corpus: 84d An ornament for the ankle (females), SHA 


Dome shaped pendant for the ear, Eardrop (females), S*e* a silver ornament for the ankle 
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(females), Sd nose pin or nose stud (worn by females), aod" Necklace (worn by males), 
abe (Thick Bracelet or Bangle, usually made of Gold or Silver, worn by females), €5* 
(Bangles of Glass, worn by females), 63M (Ear rings for males), fugs-usmt (an 


ornament for the ear, worn by females).Three examples of the sentences and bolis from 
the corpus including names of the four jewelry items (last four in the above list) are as 
follows: 


1. ane searGele! Cu vw Hise our 
kSnon chonkaUdie! dUdd ’c minona na pa 


on zentor, afent fegs dei aud fa exsmiy 


vonjarla, bavla vére vana vecdé ke vajlia? 


3. wig & Ss wan St fitus-ush, far as ais at act 
a le nattia kdrao li pIppal-pottia, kIse kole gall na kori 


4.8. CONCLUSION 


In this chapter, a new text and speech corpus for the Punjabi language has been 
designed. The new corpus is designed with special reference to one of the major dialects 
of the Punjabi language in India: the Malwai dialect. At least 20 special features of the 
new corpus have been described in this chapter. A new term Extended Pronunciation 


(SHa=" Gade) has been coined by the author to describe a feature of the Punjabi 


language where the meaning of the word changes with the addition of // at the end of 


the word, and its pronunciation is also extended. 


The corpus consists of approximately 300 items. Only a small number of 
examples have been given in this chapter due to space limitations. The complete corpus 
has been given in Appendix A and Appendix B, where each item is transcribed in three 
different ways: the Punjabi language (Gurmukhi script), IPA, and PUNJARPAbet. 
Various categories of users (readers knowledgeable in the Punjabi language in Gurmukhi 
script, linguists knowledgeable in IPA, and the scientists knowledgeable in 
PUNJARPAbet) can equally understand and benefit from the corpus. 


The fact that the new corpus is very rich and versatile due to a wide variety of its 
linguistic and cultural features makes it an ideal corpus for any serious speech processing 
work in the Punjabi language. a 
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CHAPTER V 
LINEAR PREDICTION ANALYSIS AND 
SYNTHESIS OF PUNJABI SPEECH‘ 


5.1. INTRODUCTION 


In this chapter, all the theoretical concepts required to analyze and synthesize 
speech are discussed. These concepts have been implemented to synthesize the Punjabi 
speech sentences towards the end of this chapter. Topics discussed in this chapter include 
Linear Prediction Analysis and Synthesis of speech; Pitch, Speech synthesis, and tones; 
Analysis/synthesis considerations; Voice Coders (VoCoder); Data compression; 
Computational savings; and the list of the Punjabi sentences synthesized in this project. 


Linear prediction technique is also known as Linear Predictive Coding or LPC 
[Rabiner and Schafer, 89, pp. 473]. The term linear predictive coding (LPC) refers to 
several formulations which are essentially equivalent as far as the modeling of the speech 
signal is concerned. At least seven different formulations have been mentioned in 
literature by various authors [76, 87-89, 99]. We will discuss two of these formulations: 
covariance method and autocorrelation method in Section 5.2.1 because they are 
sufficient to complete this project. 


No work has so far been reported in the literature where the Punjabi speech has 
been analyzed/synthesized using the linear prediction model of speech production. This 
thesis, therefore, concentrates on the speech analysis and synthesis of the Punjabi 
language using the linear prediction model of speech production. The linear prediction 
model of speech production, also known as ‘linear prediction model’ utilizes an all-pole 
digital filter to represent the characteristics of the speech signal s, for short-segments 
under consideration. The transfer function H(Z) of the all-pole filter is expressed as 


S(Z)_ ss G 


H(Z) = a 
UA) a5 wa 


(5.1) 


In statistics, the all-pole model is termed the autoregressive (AR) model, because the 
output is said to regress on itself [24, pp. 273]. In Eq. (5.1), G is the gain factor; a;’s are 


“ Some parts of the contents of this chapter have been published in the international journal PARKH, Vol. I, Jan-June 2013, pp. 179- 
188. Other parts have been presented in WORLD CONFERENCE on INNOVATION and COMPUTER SCIENCES (INSODE- 
2014), Rome, Italy, Apr 2014, and will be published in the journal AWERProcedia Information Technology and Computer 
Science. 
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the linear prediction (LP) coefficients, and p is the number of LP coefficients or the 
predictor coefficients (The order of the model or the number of poles also mean the same 
thing). The value of ag is normalized to unity. 


In Linear prediction analysis, we determine the four control parameters (as 
defined in Section 5.2) of the linear prediction model (Fig. 5.1), due to Atal and Hanauer 
[11-12, 99] directly from the speech waveform. The objective of Linear prediction 
synthesis is to synthesize the same speech waveform by utilizing these four control 
parameters as an input to the linear prediction synthesis model or linear prediction 
synthesizer (Fig. 5.2), due to Atal and Hanauer [11-12, 99]. 


bet fe voiced ee | 


aN aANaA unvoiced 





time-varying 
linear 
predictor p 


(a) 





Figure 5.1: Linear prediction model of speech production 
(a) Time-domain representation (b) Frequency-domain representation 
(due to Atal and Hanauer [12]) 
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With reference to Fig. 5.1 and Fig. 5.2, the three variables (e,, un, and v,) are defined 
as follows: 


(a) e, is the error between the actual value of speech sample s,, and its predicted value S,,. 
The error €, = S,- S, 1S also called residual or residual error or prediction error. 

(b) u, is the excitation function for the voiced sounds. Its value is zero except for one 
sample at the beginning of every pitch period. 

(c) v, is the excitation function for the unvoiced sounds. v, is stationary white noise, 1.e., a 
sequence of unity variance, zero mean random numbers from the random number 
generator. 


5.2. LINEAR PREDICTION ANALYSIS 


The basic idea behind Linear Prediction analysis (LP analysis) is that a speech 
sample can be estimated, approximated, or predicted as a linear combination of the past p 
speech samples. By minimizing the sum of the squared differences (over a finite interval 
of speech segment) between the actual speech samples (s,), and the predicted speech 
samples (S,), a unique set of p coefficients, known as LP coefficients or predictor 
coefficients can be computed. The finite interval of speech segment represents a new 
vocal tract configuration in the speech production, and the LP coefficients are the 
weighting coefficients in the linear combination. 


Linear prediction analysis aims to evaluate the following four control parameters 
of the linear prediction model (Fig. 5.1) due to Atal and Hanauer [11-12, 99]. 
i) aj; j=1,2,3,..,p (LP coefficients) 
ii) G (Gain factor) 
ii1) V/UV (Voiced / Unvoiced decision) 
iv) P (Pitch period for voiced speech) 
In this project, these parameters have been determined from the speech samples 
using well-known mathematical techniques discussed in the following subsections. 
5.2.1. LINEAR PREDICTION COEFFICIENTS 
The all-pole model of Eq. (5.1) can be characterized by a difference equation of the form: 
S, = ->" a S,_5 + Gu,, 
j=l 
(5.2) 
The excitation function u, is zero except for one sample at the beginning of every pitch 


period for voiced sounds. Thus 
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Pp 
= -)' a S,_;;3n>0 
j=l 
(5.3) 


Eq. (5.3) implies that for n > 0, the speech sample s, is a linear combination of the 
previous p samples. Stating slightly differently, it means that the current speech sample 
Sy 18 linearly predictable from the previous p samples. This is where the name Linear 
Prediction comes from. If the data to be modeled corresponds exactly to the all-pole 
model of Eq. (5.1), then Eq. (5.3) will be satisfied exactly. Since the model is not perfect 
in this sense, the linearly predicted sample will only be a close approximation to s,. Let 
us denote the predicted sample as §,, so that: 


p 
$= -), as, ;n>0 
jal 
(5.4) 


Let us define an error e, between the actual value of speech sample s,, and the value 
§, predicted by Eq. (5.4) as the difference s,, - §,: 


P P 
=5, +) 4/5, = > a an >0 
j=l j=0 
(5.5) 
The error e, is also referred to as residual or residual error or prediction error. The 


linear prediction coefficients a;’s are chosen so as to minimize the total squared error (of 
the frame under consideration) given by 


(5.6a) 


(5.6b) 


In order to solve for a set of the LP coefficients aj’s, Eq. (5.6) is differentiated with 
respect to a;’s. The result of the differentiation is then, set equal to zero as follows: 
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(5.7) 
This leads us to the following set of linear equations: 


P 
> ‘a;> ee =) 18,5, 4s1 Sk < p 
j=l n n 


(5.8) 
The minimum total squared error E,,, is given by equations (5.6) and (5.8) as: 
E,, = » Sh : as ya, > SaSn—j 
n j=l n 
at 5.9) 


Eq. (5.8) and Eq. (5.9) have been derived considering only the voiced sounds of Eq. 
(5.3) where the excitation function is an impulse at the beginning of the pitch period. 
Same results can be achieved for the unvoiced sounds where the excitation function v, is 
stationary white noise (a sequence of unity variance, zero mean random numbers from 
the random number generator), as we can see in the following equations (5.10-5.14). The 
unvoiced sounds are defined as follows: 


Pp 
S32 ->> aS); + Vv, 
j=l 
(5.10) 
Therefore, the predicated sample for the unvoiced sounds can be defined as: 
Pp 
S,= —>°a,s,_, +V, 
j=l 
(5.11) 
Because the s, for unvoiced sounds is a sample of a random process, the residual or 
residual error €, = Sj- S, 1S also a sample of a random process [74]. We can minimize the 


expected value (where < > stands for expectations) of the square of the error. As a result, 
we have: 


(5.12) 


We know that v, and s, are uncorrelated, hence <v,.s,.> is zero. Applying Eq. 
(5.7) to Eq. (5.12) gives: 
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Ya, (58) =-(5,8,_,);15k <p 
(5.13) 
The minimum total squared error E,, is expressed as 


E = (37) + Sia, (5,5; ) 
| (5.14) 


Speech is a nonstationary process, but can be considered locally stationary due to its 
slowly time-varying quasi-periodic nature [12, 74, 87-89]. So, taking the expectations of 
Eq. (5.13) and Eq. (5.14) will give us the same equations as Eq. (5.8) and Eq. (5.9). 


Considering that s, is non-zero for 0 <n < N-1 (where N is the speech segment length or 
frame length in terms of number of samples), and on applying the Z-transform to Eq. 
(5.5), we get 


E(Z) -[1¥Saz}.se = A(Z)-S(Z) 
(3.15) 


where 


P . 
A(Z) =1+ >)a,Z7/ 
j=l 


(5.16) 


is an all-zero filter known as Inverse Filter or Prediction Error Filter [74, 76, 99]. The 
name inverse filter comes from the fact that A(Z) is an inverse filter for the system H(Z) 
of Eq. (5.1) rewritten as follows: 


(5.17) 


The residual or the prediction error e, can therefore be considered as the result of 
passing s, through the inverse filter A(Z). This important observation is exploited in some 
of the pitch detection algorithms (e.g., the S/FT algorithm described in Appendix D). 


The limits of summation in equations (5.6, 5.8 and 5.9) were purposely left 
unspecified. There are two basic approaches to this question leading to two different 
formulations of linear prediction analysis. These two formulations are the first two, out 
of the seven equivalent formulations of linear prediction, mentioned in literature as listed 
below: 
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. the covariance method [Atal and Hanauer (11-12), 99] 

. the autocorrelation method [Makhoul (74); Markel and Gray (76)] 
. the lattice method [Burg (87-89), Makhoul (74)] 

. the inverse filter formulation [Markel and Gray (76)] 

. the spectral estimation formulation [Burg (87-89)] 

. the maximum likelihood formulation [Itakura and Saito (52), 99] 

. the inner product formulation [Markel and Gray (76)] 


NNDB WNHE 


As pointed out by Rabiner and Schafer [87-89], “the differences mainly are often 
philosophical or in point of view toward the problem of speech modeling. The differences 
mainly concern the details of the computations used to obtain the predictor coefficients”. 
The authors discuss only first three formulations (or three basic methods of analysis) in 
their latest book [89, pp. 474] stating that “all the other formulations are essentially 
equivalent to one of these three. The importance of linear prediction lies in the accuracy 
with which the basic model applies to speech.” 


We will discuss only two formulations, which are sufficient for the completion of 
this project in the following two sub-subsections: 


5.2.1.1. AUTOCORRELATION METHOD 


In this method the speech segment (frame) is assumed to be identically zero outside 
the interval 0 <n <N-1. This can be achieved by multiplying s, by a finite length window 
(e.g., Hamming window) that is identically zero outside the interval 0 <n < N-1. The 
corresponding prediction error E is non-zero over the interval 0 <n < N-/]+p: 


N-1|+p 


2 
E= > eC, 
n=O 


(5.18) 
Eq. (5.8), then, becomes 
N-l+p N-1-|j—k| 
8, jSn—-k = Si Sn4|j—k| 
=R(|j-k|); sj, ksp 
(5.19) 
Equations to be solved for this method, therefore, are: 
daR(| j-k |) =-R(k);1Sk <p 
..(5.20) 


and the minimum mean square prediction error of Eq. (5.9), for this method becomes: 
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E,, = R(O)+ S\a,R() 


(5.21) 


where the autocorrelation coefficients for equations (5.12-5.13) are specified by: 


N-l-m 


R(m) a >, 5 Spams sms Pp 
n=0 


(5.22) 


The set of Eq. (5.20), known as ‘Normal Equations’ in least squares terminology [71, 
74, 91], can be expressed in the matrix form as follows: 








r RO) = RQ)—s«éR(Q):iwaa SRP) T R(1) 
RQ) =R(0) ~=RQA) a... R(P-2) a, R(2) 
R(2) RQ) RO)... R(p-3) a} | RQ) 

[R(p-1) R(p-2) R(p-3) ... RO) | | a, | | R(p) | 

















(5.23) 


This p * p matrix of autocorrelation coefficients is a Toeplitz matrix, named after 
Otto Toeplitz [152], and it is a special matrix in linear algebra. It is a diagonal-constant 
matrix, 1.e., all the elements along any diagonal are equal and it can be defined with only 
the first row and the first column. A linear system involving an n * n Toeplitz matrix has 
only 2n-/ degrees of freedom rather than n’ degrees of freedom, and it is easier to solve. 
In addition, the Toeplitz matrix involved in Eq. (5.23) is also symmetric and positive 
definite (Note that a symmetric Toeplitz matrix is defined by just one row). These special 
properties of Eq. (5.23) lead to an efficient solution by Levinson-Robinson or Levinson- 
Durbin recursive algorithm [Levinson (71), Robinson (91), 11, 15, 24, 99]. From the 
Algorithm Design and Analysis point of view, the computational complexity of the 
Levinson-Durbin algorithm is on the order of p’ as compared with the Gaussian 
elimination technique, or the matrix inversion technique whose computational complexity 
is on the order of p®. Even more-efficient algorithms have been proposed, but they are 
unacceptable to most speech researchers because they are numerically unstable [15, 
pp.125]. A brief description of this algorithm has been included in Appendix C for the 
sake of completeness. 


If all the autocorrelation coefficients are scaled by a constant, then the solution to Eq. 
(5.23) remains unchanged. In particular, if all R(j) are normalized by dividing by R(0), 
the resulting coefficients r(j) are called normalized autocorrelation coefficients: 
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. _ RC) 
r(j) = Roy 


(5.24) 


Since R(0) is always > R(j), therefore |r()| < 1. Normalized error (normalized residual 
energy) from Eq. (5.21) is: 


| Ee ee eee 
Eyl Dar=H(I-k?) 


(5.25) 


where kj; 1 < j < p are intermediate quantities in the solution process of Levinson- 
Robinson (Levinson-Durbin) algorithm (Appendix C). These intermediate quantities are 
known as Reflection Coefficients (PARCOR Coefficients) with the special property that 
kj < ||. Therefore the normalized residual energy has the property that 


O<E,<1 
(5.26) 


In order to ensure reliable results by using this method, the speech segment length 
N should be of the order of several pitch periods ( N = 2P or 2 pitch periods, has been 
used in this work). 


5.2.1.2. COVARIANCE METHOD 


In the covariance method [11-12], we assume that the prediction error E in Eq. 
(5.6) is minimized over a finite interval 0 <n < N-/ and the signal is assumed to be 
known for the interval -p <n <N-I/ (total p + N samples). No assumptions are to be made 
about the signal outside this interval and no windowing is necessary (N can be different 
than the autocorrelation method): 


E= = e,> 
(5.27) 
Eq. (5.8), then, becomes 
Ya, (i.k) =HO.KISKS p 
(5.28) 


and the minimum mean square prediction error is 
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3 
E,, = 90,0) + >) a,g, j) 
j=l 


(5.29) 
where 


LK = SS, (54 =0K SD 
(5.30) 


In the matrix form p * p matrix of Eq. (5.28) is symmetric and positive definite but 
not Toeplitz; and the solution is generally obtained by the Cholesky decomposition 
method (or square root method). 


While choosing a method, computational efficiency and the stability are two major 
considerations. 


The autocorrelation method requires somewhat less computation (Np multiplications 
for correlation matrix [44-45] and about Pp multiplications for solution to the matrix 
equations by Levinson-Robinson algorithm) than the covariance method {Np 
multiplications for correlation matrix and (, p+9p'+2p 6 multiplications, p divisions and 
p square roots (exact figure by Portnoff et al. [74, 76, 87-89, 99] for the solution to the 
matrix equations by the Cholesky decomposition method}. 


In the autocorrelation method all the roots of A(Z) lie inside the unit circle in Z-plane 
which means that stability of H(Z) is guaranteed whereas no such assurances exist in case 
of the covariance method [11-12, 74, 76, 87-89, 99]. So the practical advantages of the 
autocorrelations method over the covariance method are obvious. In the present work, 
like most of the speech analysis research, the autocorrelation method has been used. 


5.2.2. GAIN FACTOR (G) 


It is reasonable to expect that the Gain factor (or Gain) G could be determined by 
matching the energy in the speech signal with the energy of the linearly predicted 
samples [88, pp. 404]. From Eq. (5.2) and Eq. (5.5), the residual error is given by 


n-j 


P 
e, =Gu, =s, + > a;s 
j=l 


(331) 


Since the error e, 1s proportional to input u,, therefore it is a reasonable assumption that 
the energy in the input signal is equal to the energy in the error signal [74, 76, 79], given 
as E,, in Eq. (5.21). Therefore we have: 
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G 4, _ die =E 
(5.32) 
The gain factor G is therefore given by: 
G= JE, =(RO)+4,R I” 
| (5.33) 


This expression for the gain factor G has been used by Makhoul [74], Markel & Gray 
[76] and Oppenheim [79]. 


Another method for calculating gain factor was proposed by Atal and Hanauer [11- 
12] and further improved by Klayman et al. [76, 87-89] on the basis of input-output 
energy matching in the original and the synthesized speech. According to these authors, 
the transmitted gain is a measure of energy per sample and is, hence, simply equal to the 
RMS value of the input signal: 


(5.34) 


Therefore, the input-output energy for the actual speech signal and the synthesized 
speech signal is matched by replacing the synthesized sample s,, by B X s,, where: 





(5.35) 


In the present work, the RMS formulation for gain has been used for analysis and 
synthesis (mainly due to its accuracy and simplicity), although both formulations 
described in Eq. (5.33) and Eq. (5.34) are equally acceptable to most speech researchers 
[87-89, 99]. 


5.2.3. V/UV DECISION AND PITCH EXTRACTION 


The problem of V/UV decision is to determine whether the vocal cords are 
vibrating during the generation of a short speech segment under consideration, or not. If 
the vocal cords are vibrating, then the speech segment consists mainly of voiced sounds. 
The problem of pitch extraction, then, is to determine the pitch period P, where P is the 
reciprocal of fundamental frequency Fo (the rate of oscillation of vocal cords is called 
fundamental frequency or pitch frequency or pitch). 
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The following remarks by some of the well-known researchers in this area can be 
considered representative as well as interesting: 


“A thorough discussion of published techniques for fundamental frequency or pitch 
period estimation would probably be as long as this book”. 
- [Markel and Gray,76, pp. 190] 


“Of the numerous systems for pitch extraction that have been proposed, none is free from 
deficiencies either in performance or in excessive complexity”. 
- [Maksym, 76, pp. 281] 


“Nevertheless, no single estimater yet developed offers decisive advantages in either 
reliability or computational simplicity, a fact which attests to the difficulty of the 
problem”. 

- [Tucker and Bates, 28, pp. 597] 


“No single pitch detector was uniformly top ranked across all speakers, recording 
conditions and error measurements”. 
- [Rabiner et al., 99, pp. 209] 


“Tt is the firm belief of this author that all of the proposed methods have their merits, and 
in fact, that they yield similar performance in reliability. Preference of one approach over 
another is primarily determined by the particular application in which such a system is to 
be used”. 

- [Knorr, 27, pp. 264] 


Taking into account the complexity and the reliability of dozens of the pitch 
detection algorithms available in literature [12, 15, 75, 76, 82-83, 86-90, 99-102, 122- 
126, 129], the SIFT (Simplified Inverse Filter Tracking) algorithm, due to Markel [75-76, 
24, 87-89, 99] has been applied in the earlier works of the author [26, 28-29, 31] for the 
voiced/unvoiced decision and the pitch extraction. The SJFT algorithm (described in 
Appendix D) is used mainly because it is based upon linear prediction and claimed by 
Markel to be “efficient and accurate” as compared with some of the other techniques 
based on the linear prediction principles [76, pp. 206] due to Atal and Hanauer [12], S. 
Boll [15], and Itakura and Saito [52, 99]. The Gold-Rabiner pitch tracker [82, 86-89] and 
the Autocorrelation pitch tracker [82, 87-89] have also been used in the earlier work of 
the author [30]. 


Since many pitch detection algorithms are available in literature, it was decided to 
experiment with pitch detection algorithms other than the above three algorithms (the 
SIFT algorithm, the Gold-Rabiner pitch tracker, and the Autocorrelation pitch tracker). It 
was not an easy task to decide which new algorithm should be tried, since “there is no 
single theory that comprehensively accounts for the apparently simple task of pitch 
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perception, and this is perhaps reflected in the complexity of designing methods of 
automatic pitch extraction” [123, pp. 33], and “no algorithm has been found that performs 
robustly in all cases in accordance with human perception” [82, pp. 152]. 


In addition to the above three pitch detection algorithms, two more pitch detection 
algorithms that have gained wide acceptance due to their stability and computational 
efficiency are: 


(a) ZFF (Zero Frequency Filtering) 
(b) RAPT (Robust Algorithm for Pitch Tracking). 


These two algorithms have also been used in this project. A brief description of these two 
algorithms is given below. 


1. The Zero Frequency Filtering (ZFF) algorithm as described by its 
originators [122] is summarized here in point form: 


(i) Zero Frequency Filtering (ZFF) is a technique used in the characterization and 
analysis of glottal activity from speech signals. 


(ii) The filter design originally proposed has an infinite impulse response (IIR) filter 
followed by two successive finite impulse response (FIR) filters. 


(ii1) A simplified FIR implementation is employed in the latest version. The advantage of 
the FIR filter implementation is realized in the reduction of the computational 
requirements for zero frequency filtering which include two important 
characteristics: (a) use of single-precision floating point, and (b) stability of the 
filter. 


2. The Robust Algorithm for Pitch Tracking (RAPT) is based on normalized 
cross-correlation function (NCCF) of Atal [12] and SIFT algorithm of Markel [75-76]. 
RAPT has gained popularity because it “has been used with satisfactory results on speech 
recordings varying in quality from noisy telephone to quiet laboratory conditions... The 
algorithm has been embedded in a commercially available speech-processing package 
and is in widespread use in speech research laboratories” [124, page 515]. 


Summarizing the discussion regarding the V/UV decision and pitch extraction, we 
have tried various algorithms for pitch detection (SIFT, Gold-Rabiner pitch tracker, 
Autocorrelation pitch tracker, ZFF, RAPT) for this work. No significant improvement has 
been noticed in the overall performance of the speech synthesizer when the different 
pitch detection algorithms were used. The well-known result that no single pitch 
detection algorithm is uniformly top-ranked across all environments [90, 99] has been 
confirmed. 
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5.3. LINEAR PREDICTION SYNTHESIS OF SPEECH 


Speech can be synthesized [22, 41, 49, 59, 100, 125, 126] by utilizing the four 
control parameters of the linear prediction analysis as input to a system having the same 
parametric representation as the analysis model. Figure 5.2 shows the linear prediction 
synthesizer due to Atal and Hanauer [12]. 


pitch 
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viU low pass speech 


switch filter 






s(t) 


white noise adaptive 
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predictor 
coefficients 
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Figure 5.2: Linear prediction synthesizer 
(due to Atal and Hanauer [12]) 


If the control parameters are updated at the beginning of a pitch period using a 
variable frame length every time, then the synthesis is called pitch synchronous synthesis 
whereas if the control parameters are updated once every time for a fixed-length frame, 
then the process is called pitch asynchronous synthesis [12, 76, 87-89, 99]. Pitch 
asynchronous synthesis requires interpolation of the control parameters which is not very 
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simple [12] and the interpolation of the a;’s can even lead to an unstable filter. Therefore, 
the pitch synchronous synthesis is generally preferred [87-89]. 


The pitch synchronous synthesis in the present work has been performed by using 
variable frame length of 2P samples for voiced frame and a fixed frame length of 200 
samples for unvoiced frame (P is the pitch period). Using impulse generator and white 
noise generator (producing zero mean, unity standard deviation, uncorrelated random 
sample sequence v,) as the excitation sources for voiced and unvoiced sounds 
respectively, the synthesis can be represented by these equations: 


(i) voiced sounds (0<n<N-—1, N=2P): 


Pp 
8, =—214,8,_; + Gu,; 
j=l 
un= 1 forn=0,n = p, 
un, =O forn#0,n#p 


(5.36a) 
(ii) unvoiced sounds (0 <n < N—1, N = 160): 
a 2s aS; + Vv, 
- (5.36b) 


Linear prediction synthesizer (Fig. 5.2) implements Eq. (5.36) and requires p 
multiplications and p additions to synthesize a single output sample S;. 


5.4. PITCH, SPEECH SYNTHESIS AND TONES 


This section is mainly based on the description of the Pitch Variation given by 
Bhaskararao [129, pp. 596-597]. It has been reproduced (Several words have been 
reproduced as italicized and/or reproduced in bold to emphasize the connection amongst 
the Concepts of Pitch, Speech synthesis, and Tones. These points should be appropriately 
addressed in any speech synthesis project), as under: 


1. “Pitch is the perceptual correlate of fundamental frequency. All natural 
languages use relative variations in pitch to bring out intonational differences. 
Intonational differences can signal syntactic differences (such as differences between 
interrogative and declarative sentences) or to bring out paralinguistic features such as 
emotional and attitudinal differences on the part of the speaker.” 
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2. “Although it may be too early to model paralinguistic features of intonation at 
the current state of speech synthesis, intonational patterns have to be modelled for quality 
speech synthesis.” 

3. “In addition to the use of relative pitch variation for syntactic purpose, some 
languages use relative pitch variations to signal lexical differences. Such languages are 
called tone languages. Several of the languages of the Tibeto-Burman family within India 
are tone languages. The number of contrastive tones that these languages use may vary 
from a minimum of two tones to a maximum of five tones. For instance, Manipuri has a 
two-way contrast of tones whereas Mizo has a three-way contrast.” 

4. “While synthesizing tones for a tonal language, it should be remembered that it 
is the relative variation of pitch that defines the tonal distinction but not any absolute 
values of fundamental frequency.” 

5. “As tones in a tonal language carry lexical load, the built-in lexicon of the 
language should carry tone markings and the synthesizer should be able to generate the 
appropriate tonal pattern for the given word.” 

6. “Another unique case of contrastive pitch variation occurs in Panjabi. An 
underlying rewrite process of script letter sequences into phoneme sequences occurs here. 
In Panjabi the written sequences {b"V, dv, dv, j'V, g'V, hV} are realized 
phonemically as: /pV, tV, tV, cV, kV, V/. Here V stands for a vowel with a high tone. 
Hence, speech synthesis for Panjabi will have to handle the transfer of each of the five 
members {b'V, d'V, d'V, ips g'Vjof voiced aspirated plosive set to the corresponding 
member from the voiceless unaspirated plosive set. In addition, the corresponding vowel 
has to be assigned the appropriate fundamental frequency. Notice that the letter sequence 
{hV} is transformed in to the phoneme IVI. Dogri also has a similar set of rules but its 
writing has evolved using a separate ‘accent mark’ (svara chihna) to take care of this 
change.” 


Bhaskararao [129, pp. 596-97] gives us a broader perspective by mentioning 
Manipuri, Mizo, and Dogri languages in addition to the Punjabi language. His point 
number 6 above has been clearly stated in section 4.7.4 of this thesis. The two special 
tones in Punjabi are: 

(a) low rising tone (or rising-falling tone) 

(b) high falling tone (simply called low tone, and high tone). 


The voiced aspirates (Y, 8, @, U, 3) in the initial position of words and stressed 
syllables (mentioned in point 6 above as {b"V, d"V, dV, j'V, g'V}) represent the 
unaspirated voiceless (a, J, ¢, 3, U) followed by a low tone (mentioned in point 6 


above as /pV, tV, tV, eV, kV/ in different order). The consonantal value of letter J 


represents the phoneme /h/ when initial, but non-initial J in several words, is replaced 


with tone (mentioned in point 6 above as /hV/ and IVI respectively). 
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5.5. SPEECH PROCESSING CONSIDERATIONS 


Before we examine the specific considerations for speech analysis and synthesis, 
we need to look at some general speech processing considerations including Analog and 
digital speech signals, speech compression, speech coders, and voice coders (vocoders). 


5.5.1, Analog and digital speech signals 


The inventor of telephone A. G. Bell wrote to a friend named Watson the 
following message [40, pp.3]: “Watson, if I can get a mechanism which will make a 
current of electricity vary its intensity as the air varies in density when sound is passing 
through it, I can telegraph any sound, even the sound of speech”. 


A. G. Bell based his invention (telephone) on the basic principle that the speech 
signal, like many other analog signals “is represented by measuring and reproducing the 
amplitude fluctuations of the acoustic waveform of the speech signal” [40 (pp. 3), 89 (pp. 
663)]. The digital speech signal corresponding to the analog speech signal is often 
represented by speech samples or a sequence of numbers equivalent to amplitude 
variations. 


In general, the digital speech signals are generally sampled at f; = 8 kHz or 8000 
samples/sec, and each sample is represented by b = 8 bits. This corresponds to an 
uncompressed information rate (I) of 64 kbps or kilobits/sec [154]. All sentences in this 
project have been recorded at this rate. 


5.5.2. Speech Compression 


The website called Speech Compression [154], based on the works of the famous speech 
researchers Atal [11-12], Rabiner and Schafer [87-89], Deller, Proakis, and Hansen [24], 
Furui [41], Schroeder [100], Goldberg and Rick [47], Childers [22] and others, states that 
“The compression of speech signals has many practical applications. One example is in 
digital cellular technology where many users share the same frequency bandwidth. 
Compression allows more users to share the system than otherwise possible. Another 
example is in digital voice storage (e.g. answering machines). For a given memory size, 
compression allows longer messages to be stored than otherwise... With current 
compression techniques (all of which are lossy), it is possible to reduce the rate to 8 kbps 
with almost no perceptible loss in quality. Further compression is possible at a cost of 
lower quality. All of the current low-rate speech coders are based on the principle of 
linear predictive coding (LPC).” This fact further enhances the significance of the present 
work, because LPC is our main focus. 
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5.5.3. Speech Coders 


The art of reducing the bit rate required to represent a speech signal is called speech 
coding. The algorithms used to reduce the required bit rate are called speech-coding 
algorithms, or speech coders [Kleijn, 15, pp. 283-84]. Two broad classes of the digital 
speech coding systems, commonly known as speech coders (A-to-D converter and D-to- 
A converters) have been mentioned in literature. These have been presented in the 
following subsections: 


5.5.3.1. Waveform Coding Systems or waveform Coders 


In all waveform coders, sampling and quantization are fundamental features. In order to 
digitally represent the speech waveform sampled at f, = 1/T (assuming that the number of 
bits for representing one sample is equal to b), the information rate /,, in these systems 
can be written as: 


y= b X fs 


This process for representing the speech waveform is known as Pulse Code Modulation 
(PCM). The PCM technique was developed by Shannon and others in a classic paper 
(The Philosophy of PCM) published in November 1948. 


5.5.3.2. Vocoder Systems 


These systems are also known as hybrid coders, or model-based systems, or 
analysis/synthesis systems. The goal of this class of digital speech coding system (or 
model-based coders) is to preserve the perceived speech quality and speech intelligibility 
at an information rate ([,,) lower than the information rate (/,) for the waveform coders. 
Assuming the analysis frame rate (F,) and (bf) bits required to represent one frame, an 
expression for J, can be written as: 


In= by * F, 


The basic goal of these vocoder systems is to achieve I, < I, with the speech quality of 
the synthesized speech comparable with the original input speech signal. 


5.5.3. Voice Coders (Vocoders) 


Vocoders or voice coders [Dudley (33, 99)], also known as analysis/synthesis methods, 
can operate at rates between | and 3 kb/sec [37 (ch.5), 82 (ch. 4 and 5)]. Three vocoders 
generally associated with the Linear Predictive Coding (LPC) are: Multipulse-excited 
LPC (MLPC), Residual-Excited Linear Prediction (RELP) vocoder, and Line Spectrum 
Pairs (LSP). Whereas MLPC and the RELP vocoders are typically used for bit rates 
around 9600 bits/sec, the LSPs have been used to achieve bit rates around 2400 bit/sec 
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[82], comparable with the low bit rate reported by Atal and Hanauer [12, 99], Benesty er 
al. [15], Goldberg and Riek [47], Rabiner and Schafer [87-89], Kondoz [65], and others. 


In addition to the above vocoders, several other representative vocoders frequently 
mentioned in literature [154, 23, 47, 65, 77] include the following: 


e Adaptive Differential Pulse Coded Modulation (ADPCM) scheme 

e Low-Delay Code Excited Linear Prediction (LD-CELP) scheme 

e Conjugate-Structured Algebraic Code Excited Linear Prediction (CS-ACELP) 
scheme. 

e Code Excited Linear Prediction (CELP) scheme. The bit rate is 0.6 bits/sample 
(compression ratio of 13.3:1). 

e Linear Predictive Coding (LPC10) scheme. 


We have practically implemented LPCp scheme in this chapter, where p = No. of poles. 
We have used p = 2, 4, 6, 8, 10, and 12. The relative speech compression performance of 
the ADPCM, LD-CELP, CS-ACELP, CELP, and LPC1O vocoder schemes as compared 
with the original signal is given in Table 5.1, whereas the bit rates in bits per second 
(bps) for LPCp (with p = 2, 4, 6, 8, 10, and 12) is tabulated in Table 5.2. 


Table 5.1: Vocoders and Speech Compression 























Item/ bit rate | Quantization Compression 

No. Vocoder (kbps) (bits/sample) Ratio 

1 Original 64 8 1:1 

2 ADPCM 32 4 2:1 

3 LD-CELP 16 2 4:1 

4 CS-ACELP 8 1 8:1 

5 CELP 4.8 0.6 13.3:1 

6 LPC10 2.4 0.3 26.6:1 























5.6. ANALYSIS/SYNTHESIS CONSIDERATIONS 


Quality of the synthesized output speech from analysis control parameters involves some 
considerations such as choice of methods, windowing, pre-emphasis, sampling rates (f;), 
order of the model (p), length of the analysis frame (N), and tones in a tonal language 
such as the Punjabi language. Although some of these have been discussed in the 
appropriate sections, yet all of these will be summarized here for completeness. 


Considering accuracy, storage and computation, the sampling frequency f, = 8 kHz is 
generally taken as a representative sampling rate [12, 75-76, 79, 85-89, 99-102], where 
speech signal is bandlimited to less than 4 kHz (cut-off frequency f. = 3.9 kHz) 
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bandwidth to avoid aliasing. A Nyquist frequency of 4 kHz (or lower) is commonly 
assumed for “telephone speech” [89, pp.667]. Sampling frequency f, = 8 kHz has been 
used in the present work. 


Order of the model (i.e., no. of the aj’s) depends mainly on the sampling rate. One 
complex pole per kHz is required to represent the vocal tract and 3-4 poles are required to 
represent source excitation and lip radiation. For f; = 8 kHz, value of p equal to 11 or 12 
is needed. Atal & Hanauer [11-12] showed in a graph that the prediction error decreases 
only by a small amount when p is increased beyond 12. A value of p less than or equal to 
12 is used in this work. 


The process of passing the signal through a single-zero filter J - uZ’ (0.9 <u < 1.0) 
before the analysis is known as pre-emphasis. Pre-emphasis before analysis is a 
procedure used to estimate the spectral characteristics of the vocal tract alone without the 
effects of the glottal and lip radiation characteristics. Pre-emphasis has been, therefore, 
used in the SIFT algorithm to sharpen the autocorrelation peaks in pitch detection but not 
for the estimation of a;’s because it leads to additional complexities and undesirable 
effects in the synthesis spectral properties such as low frequency boost [75-76]. 


Although a rectangular window is implicit in the autocorrelation method, an explicit 
window such as Hamming window which tapers down s, to zero is recommended to 
check the spectral distortion effects of discontinuities at sg and sy._;. A Hamming window 
has been used both in the estimation of a;’s and pitch. 


Variable frame length, equal to 2P for voiced frames and equal to 20 ms for unvoiced 
frames has been used. The combination of the Pitch synchronous analysis and synthesis is 
less complex than the pitch asynchronous analysis and synthesis combination. Therefore, 
the Pitch synchronous analysis and synthesis has been used in this work. 


As far as choice of methods in the control parameter estimation is concerned, the 
autocorrelation method for the a;’s has been chosen due to its stability and computational 
efficiency. The RMS formulation for gain factor G is chosen for its accuracy and 
simplicity. The ZFF and RAPT methods have been used in the two MATLAB [36, 121] 
programs written by the author for this project to compute the V/UV decision and the 
pitch extraction for their efficiency. The S/FT algorithm has been discussed in Appendix 
D for its efficiency and close relationship with linear prediction. 


Tones in the Punjabi language, and the Punjabi speech synthesis are also a special 
consideration. These have been discussed in Chapter I, Chapter IV (4.7.4), Section 5.4, 
and Chapter VI under the appropriate titles of Tones, Tonemes (Tonal Phonemes), or 


Tonal Systems. 


With reference to the all-pole model, two points have been emphasized by many 
authors [40, 74, 79, 87-89, 99]: 
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(1) The speech waveform is sufficiently complex so that we cannot expect it to match 
exactly even a pole-zero model, let alone the simplified all-pole model as that of Eq. 
(5.1). 

(2) It is only a good compromise that the simplicity of the all-pole model can preserve 
many of the important characteristics of the speech signal at the cost of some 
accuracy. 


5.7. ANALYSIS AND SYNTHESIS (COMBINED) 


Internet website titled “LPC Analysis and Synthesis of Speech” presents a compact 
description of the “speech compression technique known as Linear Prediction Coding 
(LPC) using DSP System Toolbox™ functionality” [153] available at the MATLAB 
command line [36, 121]. The website explains this technique using the block diagram 
(Fig. 5.3) by combing the two steps (analysis and synthesis) as outlined below. Let us call 
these two steps as step A and step B. 


Step A: 

The analysis section divides the original speech signal into frames of size 3200 samples 
(0.4 seconds) with an overlap equivalent to 0.2 seconds (that translates tol1600 samples). 
After passing each frame through the Hamming window, the autocorrelation coefficients 
a;'s for number of poles p = /2 are found. Using the Levinson-Durbin algorithm 
(Appendix C), the reflection coefficients are calculated from these autocorrelation 
coefficients. Once the reflection coefficients are extracted from the original speech signal, 
the original speech signal is passed through an all-zero analysis filter (with coefficients as 
the reflection coefficients) to compute the residual signal. 


Step B: 

The synthesis section reconstructs the original signal using the residual signal and 
reflection coefficients as input to a synthesis filter (which is the inverse of the analysis 
filter). 


This simulation plots the Signal and LPC spectrum (Fig. 5.4) using the two helper 
functions from the MATLAB [153]. These two helper functions are called: (1) hfigslpc.m 
(2) plotlpcdata.m. 


This results in significant computational savings (as described in the next section) since 
the residual signal and reflection coefficients require less number of bits to code than the 
original speech signal. 

5.8. COMPUTATIONAL SAVINGS 

Atal and Hanauer [12, 37, 99] concluded that the overall bit rate for transmission [65] and 


storage of speech required for good quality synthesized speech is (5p + 12) X F, bits/sec 
where p is the number of poles and F’, is the number of frames per second. For high 


85 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


quality speech (f, = & kHz), the average frame rate used in this work is 50. Using Atal 
and Hanauer formula, the bit rates in bits per second (bps) achievable for various number 
of poles p used in this work (p = 2, 4, 6, 8, 10, and 12) can be computed. The bit rates for 
LPCp (with p = 2, 4, 6, 8, 10, and 12) thus computed are tabulated in Table 5.2. 
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Figure 5.3: LPC Analysis and Synthesis of Speech 
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Figure 5.4: Signal and LPC Spectrum 
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Table 5.2: Poles and Bit Rate 























No. poles (p) Vocoder bit rate (bps) 
1 2 LPC2 1100 
2 4 LPC4 1600 
3 6 LPC6 2100 
4 8 LPC8 2600 
5 10 LPC10 3100 
6 12 LPC12 3600 




















5.9. SYNTHESIS OF PUNJABI SENTENCES 


Twenty five Punjabi sentences were selected from the newly-designed corpus (in 
Chapter IV) for the purpose of this project so that a representative analysis and synthesis 
can be conducted. These sentences have been listed in Appendix E. In the following 
description, ‘#’ stands for the ‘sentence number x’ in Appendix E. While choosing these 
sentences, the themes taken into consideration included the vowels and consonants, 


tonemes, nasals, semi-vowels, extended pronunciation [139, pp. (®)] and some special 


Punjabi speech sounds as follows: 


Cnet weuos, SLEOH: US SOIC SF, Ways, Igy Jog, JJ 
1. VOWELS (5 sentences): 
Three Matra Vahaks (vowel forms), two long and short vowels (i & I; u & U) 
Gwe, fod / ule, Hd / WS (#1, 2, 3, 24, 25) 
2. TONEMES (6 sentences): 
w(2) 83 dug (#6, 22, 33, 9, 36, 38) 
3. NASALS (5 sentences): 
2 ZEO HX (# 30, 34, 35, 11, 12) 
4. SEMI-VOWELS (2 sentences): 
W = (39, 19) 
5. SPECIAL PUNJABI SOUNDS (6 sentences): 
&, 0, 3, g, 3, F (#7, 8, 10, 83, 18, 41) 
6. EXTENDED PRONUNCIATION (2 sentence): 
JIS / JIM (Green / defeat) (# 4) 
wat/ WA™ (Water Pot made of clay / make jewelry) (# 22) 


7. TONE (2 Sentences) 
Jd (#4, 19) 
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The total of the above sentences (5+6+5+2+6+2+2=28) is higher than 25 because 
three of these 25 sentences belong to more than one category. The details of these 25 
sentences are given below: 
1. VOWELS (5 sentences): @ 4 ¥, fod / the, Hd / Hd (# 1, 2, 3, 24, 25) 
C1. Gis! Guer fag i? Cast ez! 
2. wHal wag, wid Mos vies S nr 
@3. tne f'eg, todo fect 8a fanre| 
24. fora so, Wa ttl 
25. Glad A UT! 


2. NASALS (5 sentences): 2 & = OH (# 30, 34, 35, 11, 12) 
Si. BA Sat da 3S Ca ST 
H12. Hyd ct Ht, HH HT, HAT HAG, AS Tel 
= 30. ase stadt? a! agg tay vise uct 
234. Ae val Ass agg Uh, S HHS ATI 
= 35. Adel fe ad Sage, Wel-wWe We fomrll 


3. TONEMES (6 sentences): YX2 ¥ JUS (#6, 9, 22, 33, 36, 38) 
we. Shura Ad yfiorg & wart or fame) 
v9. Ute dehi o de afer, feadt ce age! 
22. Wa HY sd sg 3a Hug, Saat wan ct fis 
¥ 33. Sy one =, gve vate 
036. Ge ead ue ad 3d, aot fev aed Tet 
338. SIS-TH O, Sfavr sas, ATS"|I 


4, SPECIAL SENTENCES (9 sentences): (# 4, 7, 8, 10, 18, 19, 39, 41, 83) 
Ju. TAH SIA STM AIS che fat 


s7. fetd sug dest este dg cal 
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58. Od-Od ss, Slada Saas AH 
310. 3dAH, J 33da He ada Jase fho al 
518. HS HS adh, HS HS uSet 

19. ZTew gat 6 fhe A, fest 83 Hg egal 
39. Wd Wed ud 3 vga, Usst } UG Ja"ll 
Zui. Ust dz Had OS Saad, aH aS YS Sool 
¢ 83. GHE gad ed ot dg YT ¥! 


5.10. CONCLUSION 


Twenty five representative sentences (listed above) sampled at the sampling 
frequency f,=S kHz, were processed for linear prediction analysis and synthesis of the 
Punjabi speech sentences utilizing the methods discussed in the previous sections. The 
duration of these sentences spoken by different male and female adult speakers is 
between 2.0 to 4.0 seconds. The sentences are representative for speech processing in the 
sense that these are made of all typical speech sounds (Chapter II), i.e., voiced, unvoiced, 
plosive, nasal and non-nasal sounds, vowels, consonants, and tonemes. 


After analysis, the Punjabi speech sentences were synthesized with different 
numbers of poles (p). Results of the informal perceptual listening tests can be summed up 
as follows: 


(i) The quality of the synthesized speech for p = 12 was found to be almost as 
good as the original speech. Increasing p beyond 12 did not show any significant 
change/improvement in the quality of the synthesized speech thereby leading to 
the conclusion that p = 12 is sufficient to provide an adequate representation of 
the speech signals. 


(ii) Slight degradation in the quality of synthesized speech was noticeable when 
p was decreased to 8 especially in nasal and plosive sounds (Table 5.2). 


(ii1) Although poor in quality, the synthesized speech for p even as low as two 
was intelligible (Table 5.2). 


(iv) The quality of the synthesized speech was better when the Hamming window 
was used in autocorrelation method than the speech obtained otherwise using an 
implicit rectangular window. 
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(v) The quality of the nasal, plosive and voiced fricative sounds in the 
synthesized speech was not as good as the quality of the voiced, non-nasal or 
unvoiced sounds. This was so expected due to the limitations of the simplified all 
pole model. 


(vi) The analysis/synthesis of the Punjabi speech sentences was investigated by 
varying the algorithms used for pitch detection (S7FT, Gold-Rabiner pitch tracker, 
Autocorrelation pitch tracker, ZFF, RAPT). No significant improvement has been 
noticed in the overall performance of the speech synthesizer when different pitch 
detection algorithms were used. Therefore, the well-known result that no single 
pitch detection algorithm is uniformly top-ranked across all environments [90, 
99] has been confirmed. 


These results are basically similar to the results reported by the leading speech 
researchers such as Atal and Hanauer [11-12, 15, 85-89, 99], Markel and Gray [75-76, 
87-89, 99], Rabiner and Schafer [86-90, 99], Oppenheim [79, 86-89, 99-100], Makhoul 
[79, 99-100], Parsons [83], Benesty er al. [15], Yegnanarayana [129], and supported by 
recent books [20, 22, 23, 41, 49, 59, 100, 125, 126]. 


Original waveforms, synthesized speech waveforms (p = 2 and p = 12), Fast 
Fourier Transform (FFT) spectrograms [72-73], as well as the pitch, intensity and 
formants plots generated by MATLAB and Praat for several representative sentences 
analyzed and synthesized in this project are presented in the next chapter. 


This chapter confirms that the nature of this project is highly interdisciplinary. 
This work not only requires the knowledge of the Punjabi Language, Gurmukhi script, 
Malwai dialect, Linguistics and Phonetics (Chapters I-IV), it will also require the 
knowledge of Computing Science, Information Technology, Engineering, Mathematics, 
and Statistics to implement the speech analysis and synthesis in this chapter. a 
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CHAPTER VI 
GRAPHICAL ANALYSIS: 


6.1. INTRODUCTION 


This chapter includes the graphical analysis of the results obtained in Chapter II 
and Chapter V. The graphical analysis consists of the following items: 

(a) Spectrographic analysis, formant frequencies as well as the pitch and intensity 
variations of speech waveforms analyzed and synthesized in Chapter V. 

(b) Graphical evaluation of the new phonetic coding scheme PUNJARPAbet 
designed in Chapter II by using corpus, and the typing rates of different coding schemes. 


Before we give the spectrographic analysis, all necessary concepts based on the 
internet websites [154-160] are presented to provide necessary background (in addition to 
that presented in Chapters I to V) to enable the reader to thoroughly understand this 
analysis. These concepts include: Vocal Tract and Linear Prediction Model, 
Spectrograms, Components of Sound, Praat and MATLAB, Spectral Characteristics, and 
Sentences used for analysis/synthesis. 


For an adult male vocal tract (length = 17 cm.), the first three unconstricted 
resonant frequencies generally fall at about fj; = 500 Hz, f2 = 1.5 kHz, f3 = 2.5 kHz. 
Although the higher formants (resonances of the vocal tract) do contribute to produce 
speech sounds of acceptable quality, perceptually only the first three formants mentioned 
here are important in determining the sound that is heard. The speech spectrum typically 
rolls off at —12 to —18 decibels/octave from about 1 kHz on. Consistent with these 
observations, the spectral characteristics periodograms (log magnitude vs frequency) has 
been plotted by using FFT [72-73] to gain additional insight into the perceptual listening 
tests of the synthesized speech in Chapter V. The graphs for formant frequencies [38-40, 
99] as well as the pitch and intensity variations have also been presented in this chapter. 


Some of the features of the corpus designed in Chapter IV and transcribed using 
PUNJARPAbet designed in Chapter II, have also been graphically evaluated in this 
chapter. The graphical analysis summarized in figures and tables in this chapter clearly 
demonstrates that PUNJARPAbet is a faster, versatile, efficient and convenient phonetic 
coding scheme. It is free from the laborious, irritating, and time-consuming necessities 
for dealing with the special symbols for vowels, nasalization, tones, and inserting 
diacritical marks (especially where two diacritical marks over the ten vowel signs are 
needed for the Punjabi language in Gurmukhi script). PUNJARPAbet is an efficient 


> The results obtained in this chapter have been published in Journal of Circuits, Systems, and Computers (JCSC), Volume 23, No. 
5, June 2014 (SCI) and PARKH, Volume II, July-Dec 2013. The partial results of this chapter have also been presented in the 
international conference INSODE-2014 (Rome, Italy). 
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coding scheme not only for the Punjabi language, but also for any language which has 
sounds similar to the ones found in the Punjabi speech. 


6.2. VOCAL TRACT AND LINEAR PREDICTION MODEL 


The relationship between the physical (vocal tract) model and the mathematical 
(Linear prediction or LPC) model will be helpful to understand the quantities (pitch, 
intensity, and formant frequencies) being graphed in this chapter. This relationship can be 
stated as follows [154]: 

e Vocal Tract is equivalent to LPC filter with transfer function H(Z) 
Vocal Cord vibration is equivalent to V (Voiced speech) 
Fricatives and Plosives are equivalent to UV (Unvoiced speech) 
Vocal Cord Vibration period is equivalent to P (Pitch period) 

Air is equivalent to Excitation function u, 
Air Volume (loudness) is equivalent to Gain (G) 


LPC model leads to significant savings as mentioned below: 

Assume we are dealing with the speech coder LPC10 (the model with LP coefficients p = 
10). For the sampling frequency f; = 8 kHz, average frame rate F,. = 50, and frame size N 
= 20 ms = 160 samples, the vector A generated by the LPC10 consists of the following 13 
values: 


A = (dj), a2, 43, A4, As, As, a7, As, ao, Ajo, V/UV, P, G). 


However, this vector has completely and compactly represented the following 160 values 
of the original speech signal described by the vector S: 


S = (So, $1, $2, 83, .--, $159). 


This is how the computational savings are achieved by the linear prediction model of 
speech production. This phenomenon is described in this chapter (from a slightly 
different angle than that expressed in Chapter V) for the sake of completeness. 


6.3. SPECTROGRAMS 


A spectrogram is a visual representation of the spectrum of frequencies in a sound or 
other signal as they vary with an independent variable such as time [156-157]. Various 
authors have called spectrograms by other names such as sonagrams, voicegrams, 
voiceprints, or spectral waterfalls. In general, independent variable (time) is displayed on 
the horizontal axis, whereas the frequencies and amplitudes (on a linear or logarithmic 
scale) are displayed on the vertical axis. 
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Spectrograms are usually created either by using the analog processing (bandpass filters 
method), or digital processing (FFT method). Analog processing was the only 
methodology available before the digital computer revolution. 


The instrument that generates a spectrogram is called a spectrograph or sonagraph. In 
spectrograms, the amplitude of the frequency components “is expressed by means of the 
degree of blackness (more energy, more blackness). As formants are frequency regions 
with a high energy (due to the filter resonances), in the spectrogram they are displayed 
as dark bands” [157]. In the spectrographs, the frequency range from 0 to 4 kHz has been 
displayed because we have used f, = 8 kHz. 


Spectrograms can be used to identify the spoken words phonetically, and to analyze the 
various calls of animals. In addition to processing of the speech signals, they have also 
been used extensively in the understanding and development of the fields of music, sonar, 
radar, and seismology signals [156]. Spectrograms have many applications. The 
applications closely related to our project are studying the animal sounds, medical 
assistance in dealing with the speech defects and speech training, the study of phonetics, 
and speech synthesis. 


6.4. THE COMPONENTS OF SOUND 


Different sounds (e.g., thunderstorms, vehicle sounds, animal sounds, and human sounds) 
sound different to us, due to the three properties of sound: intensity, pitch, and tone 
[159]. Due to the variations in these properties, aeroplanes, and fire alarms are loud, 
whispers are soft, and every one of our family members, relatives, and friends has a 
different voice. 


Intensity 


Like any other wave, the sound waves have height, or amplitude. Amplitude is a measure 
of energy. The more energy a wave has, the higher is its amplitude. As amplitude 
increases, intensity also increases. Intensity is the amount of energy a sound has over an 
area. The same sound is more intense if you hear it in a smaller area. In general, we call 
sounds with a higher intensity louder. The sounds can be loud (e.g., yelling), or soft (e.g., 
breathing). Loudness cannot be assigned a specific number, but intensity can. Intensity is 
measured in decibels or db. Decibels and intensity can be measured with instruments. A 
whisper is about 10-30 db, normal conversation is 60 db, and the lawn mower sound is 
100 db. Listening to loud sounds (sounds with intensities above 85 db) can damage 
human ears because the human ear is very sensitive to high sounds. Sounds over 120 
decibels (the threshold of pain) are painful to listen to [159]. 
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Pitch 


Pitch helps us to distinguish between low and high sounds. If a singer sings the same note 
twice (say, one an octave above the other), we can hear a difference between these two 
sounds. That is because their pitch is different. Adult males have low pitch voices, 
whereas women and young children have high pitch voices. Pitch depends on the 
frequency of a sound wave. Frequency is the number of wavelengths that fit into one 
unit of time. Frequencies are measured in Hz (hertz). One Hz is equal to one cycle of 
compression and rarefaction per second. High sounds have high frequencies and low 
sounds have low frequencies. Thunder sounds have frequency range near 50 Hz, whereas 
a whistle can have a frequency of 1 kHz. The human ear is able to hear frequencies from 
20 Hz to 20 kHz. We cannot hear sounds with higher frequencies (e.g. dog whistle), 
while animals can. Sounds that are too high for us to hear are called ultrasonic [159]. 


Tone (or sound quality) 


Some sounds are pleasant while others are unpleasant, because they have a different tone, 
or sound quality. Several sounds (e.g. music, sound of rain) are pleasant, while others 
(e.g., hammer, construction work, an inexperienced learner playing a musical instrument) 
are unpleasant to listen to. Both music and noise are sounds, but music is much more 
enjoyable than the noise. To help classify sounds, there are three properties which a 
sound must have to be musical. A sound must have an identifiable pitch, a good or 
pleasing quality of tone, and repeating pattern or rhythm to be music. Noise on the other 
hand has no identifiable pitch, no pleasing tone, and may have no steady rhythm [159]. 


When a source vibrates, it actually vibrates with many frequencies at the same time. Each 
of those frequencies produces a wave. Sound quality depends on the combination of 
different frequencies of sound waves. This pitch is defined as the fundamental frequency. 
The whole number multiples of the fundamental are called harmonics. In most cases, 
more harmonics mean the better (or fuller) quality of a sound. All the different overtones 
(frequencies higher than the fundamental) of a sound help give it a unique pattern. This is 
especially true for a person’s voice. Everybody in the world has a different voice print, or 
pattern of overtones. A criminal can be tracked, if the criminal’s voice print is known, 
just like the fingerprints (see Section 1.7 and 3.5). Voice identification equipment is used 
in advanced security systems to recognize and let in only one authorized person. Voice 
prints are also used in modern technology, for example, voice activated telephones [159]. 


6.5. PRAAT and MATLAB 
Praat (a Dutch word that stands for "talk") is an open-source program for the analysis of 
speech [157]. It is a free scientific computer software package for the analysis of speech 


in phonetics. It was originally designed, and continues to be developed by Paul Boersma 
and David Weenink of the University of Amsterdam [157-158]. 
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MATLAB (acronym for matrix laboratory), a multi-paradigm numerical computing 
environment and fourth-generation programming language, is developed by MathWorks. 
MATLAB package allows matrix manipulations, plotting of functions and data, 
implementation of algorithms, creation of user interfaces, and interfacing with programs 
written in other languages, e.g., C, C++, Java, and FORTRAN [160]. MATLAB, 
intended primarily for numerical computing, has gained a lot of popularity in signal 
processing, including digital speech processing [36, 121, 160]. 


Both Praat and MATLAB have been extensively used in the graphical analysis 
presented in the next sections. 


6.6. SPECTRAL CHARACTERISTICS 


The spectral characteristics of the speech sentences analyzed and synthesized 
using the linear prediction model as discussed in Chapter V are graphically presented in 
this chapter as Spectrograms (time-varying representations forming images). The 
frequency-axis can be either linear or logarithmic [156]. The spectrograms of a speech 
signal are computed by using the Short-time Fourier Transform as outlined here. 


The DFT (Discrete Fourier Transform) of a finite sequence s,(O<k < N—1) and 
its Inverse DFT is defined as: 


N-1 
Sik) = Yistnye JOH" O<k<N-1 (6.1) 
n=0 
1 N- 
S(n) = 7 she elOnIN) nk Q<n<N-1 (6.2) 


The log magnitude spectrum of the input data LM (S) and that of the output of the models 
LM (G/A) are taken as: 


LM (S) = 10log,, |S(4)|” (6.3) 


and 
2 


LM(G/A) = 10 log,, (6.4) 


2 


ACK) 


where 0) ck<t. 


95 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


6.7. SENTENCES USED FOR ANALYSIS/SYNTHESIS 


Many different approaches can be taken for the purpose of spectrographic 
analysis of a speech signal. A considerable number of graphs are involved for the 
spectrographic analysis of a single speech sentence. To keep the volume under control, 
25 sentences (listed in Appendix E) have been selected for the purpose of this project so 
that a representative analysis/synthesis can be conducted (‘#’ stands for the “sentence 
number x’ as listed in Appendix E). While choosing these sentences, the themes taken 
into consideration included the vowels and consonants, tonemes, nasals, semi-vowels, 


extended pronunciation [139, pp. (®)] and some special Punjabi speech sounds as 


follows: 


Cue wetos; SSEGH WS SOS SF, ways, JId/ SIM, TT 


8. VOWELS (5 sentences): 
Three Matra Vahaks, two long and short vowels (1 & I; u & U) 


Cuy, fod/ dle, HI/ AS (#1, 2, 3, 24, 25) 
9. TONEMES (6 sentences): 
wW (2) ¥ SU sg (#6, 22, 33, 9, 36, 38) 
10. NASALS (5 sentences): 
2 Ze OH (# 30, 34, 35, 11, 12) 
11. SEMI-VOWELS (2 sentences): 
W (39, 19) 
12. SPECIAL PUNJABI SOUNDS (6 sentences): 
e, 0, 3, &, 3, (#7, 8, 10, 83, 18, 41) 
13. EXTENDED PRONUNCIATION (2 sentence): 
Jd/ JIM (Green / defeat) (# 4) 
wat/ Wa™ (Water Pot made of clay / make jewelry) (# 22) 
14. TONE (2 Sentences) 
Jd (#4, 19) 


The total of the above sentences (5+6+5+2+6+2+2=28) is higher than 25 because 
three of these 25 sentences belong to more than one category. The details of these 25 
sentences are given below: 


1. VOWELS: 8 ¥, fod / td, Hd / Ad (5 sentences: # 1, 2, 3, 24, 25) 
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C1. Gey! Quer fag wy Gast ez! 

2. wHe!l ag, nig nod nied S yr] 

3. whee fle, wade ict 8a for] 
24. fos aa, Aa dtl 

25. GH, AD UT! 


2. NASALS (5 sentences): 2 = & 0 H (# 30, 34, 35, 11, 12) 


Bll. BA Sat da ’SS CAST 

H12. Hyd StH, HH HHT, HAT HAG, AS Te! 
530. ase serGdle a! aos ta vrise ust 
234. qe ug Ass aga uh, S Ages AAT 
=35. Adel ore ad sana, Wel-we Are fom 


3. TONEMES (6 sentences): YX2 ¥ TUS (# 6, 9, 22, 33, 36, 38) 


wo. Sora) yfimrg a wer a fomr 

Zo. Ute ddht o de dio, feadt ce age! 

22. wa Hy sd sg 3o Hug, saat want fae ll 
333. Syonee, ave saree 

036. Geese ad 3d, ag fev aed Tet 

338. SIS-SH S, Slow saIsZ, ATS" Il 


4. SPECIAL SENTENCES (9 sentences): (# 4, 7, 8, 10, 18, 19, 39, 41, 83) 


34. vaHsdngsgonags de ff 
s7. fetdssuq dvestest ede cal 
o 8. Qdodes, dadagaas ddl 


97 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


310. 3dAH, 33308 de age Sass fio ad! 
B18. HSH oCdt, us HS uc 

19. PITec gat > fhe A, feet 63 Hig egal 
W739. Wdted ua 3 vga, ssi } us sda" ll 
241, Ustugdit Had oS Sad, AA aA aT sot 
283. gHE gad egal dgugy! 


6.8. SPECTROGRAPHIC ANALYSIS (FORMANTS, PITCH, INTENSITY) 


The graphical analysis of the 25 sentences synthesized in this project has been 
summarized in Table 6.1. It gives an overview of the four values (pitch, intensity, and the 
first two formants) for the following five signals: original sentence, residual signal (p = 
2), synthesized signal (p = 2), residual signal (p = 12) and the synthesized signal (p = 12), 
where p stands for the number of poles. 


Formant frequencies [38-40, 99] play an important role to distinguish various speech 
sounds. We have compared original speech spectrum for various sentences and bolis with 
synthesized and residual sound spectrum having 2 poles and 12 poles (LPC) respectively 
for the different groups of the sentences as shown in Tables 6.2, 6.4, 6.6, and 6.8. The 
Pitch and intensity variations for the same groups are shown in Tables 6.3, 6.5, 6.7, and 
6.9. 
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Table 6.1: Summary of the Graphical Analysis of the 25 Synthesized Sentences 


April 2014 
Synthesized 


| | CCriginal =| ———_Bewivel___|__Snibetirel_ —__ 

Per | riafaensig| Te || pls rihfaensgl Te | [Pick fneasigl Te] 
oles 

ee ee ee 
po 157.1] 69.64 | 932.04 | 1885.8 | 166.5] 74.28 | 543.15 | 1423.99 | 
| 2 (| 145.2] 70.96 | 630.37 | 1491.01] 2 | 141.7] 71.01 | 915.34 | 1773.86 | 145.2 75.86 | 630.39 | 1491.04 | 
PCE CT 038.2] 68.68 | 966.24 | 1880.12 | 145.2] 75.86 | 630.39 | 1491.04 | 
| 3-155 | 69.16 | 529.88 | 162499| 2 | 1464] 69.5 | 812.62 | 1839.09 | 155 | 73.54 | 529.85 | 1624.97 | 
Po C9469 68.27_| 958.37 | 1895.58 | 155 | 73.54 | 529.85 | 1624.97 | 
| 4 —*[ 133.4 69.84 | 622.18 | 1482.8 | 2 | 135.2 71.03 | 887.84 | 1824.15 | 133.4| 73.28 | 622.17 | 1482.81 
po CT 134.6] 69.34 | 956.8 | 1886.91 | 133.4] 73.28 | 622.17 | 1482.81 | 
| 6 —*{ 140.3|_ 73.79 | 558.27 | 1438.82] 2 | 140.7| 71.87 | 862.04 | 1769.25 | 140.3| 75.98 | 558.26 | 1438.84 | 
PT CT 937.5] 69.91 | 946.2 | 1867.03 [140.3] 75.98 | 558.26 | 1438.84 | 
| 7 (| 158.8] 66.47 | 530.95 | 1785.74] 2 | 161.3] 70.9 | 748.45 | 1928.29 | 158.8] 74.35 | 530.94 | 1785.71 | 
Pt CT 061.7] 66.26 | 958.22 | 1912.12 [158.8] 74.35 | 530.94 | 1785.71 | 
| 8 —({158.7| 70.33 | 566.71 | 1484.24] 2 | 161 | 71.3 | 848.51 | 1829.93 | 158.7| 73.77 | 566.68 | 1484.24 | 
po C2 160.4] 70.05 _| 938.81 | 1896.69 | 158.7] 73.77 | 566.68 | 1484.24 | 
| 9 —({161.3| 74.14 | 537.18 | 1588.24] 2 | 168.6| 72.93 | 834.06 | 1866.62 | 161.3| 75.59 | 537.31 | 1588.11 | 
po 972.5] 69.58 | 966.5 | 1898.47 | 161.3] 75.59 | 537.31 | 1588.11 | 
| 10 | 162.5] 73.32_| 561.35 | 1419.51] 2 | 163.1] 72.25 | 869.22 | 1786.6 | 162.5] 74.42 | 561.4 | 1419.77 | 
PT CT 157.2] 69.39 | 938.62 | 1864.63 | 162.5] 74.42 | 561.4 | 1419.77 | 
pA 157.1] 72.62 | 475.43 | 1554.53] 2 | 158.1] 71.99 | 795.26 | 1892.32 | 157.1] 76.06 | 475.41 | 1554.51 | 
po 159.2] 70.37 _ | 942.24 | 1887.42 | 157.1] 76.06 | 475.41 | 1554.51 | 
| 12—[ 149.5|_ 74.73 | 512.55 | 1533.26] 2 [148.6{ 71.27 | 800.06 | 1830.18 | 149.5| 76.92 | 512.55 | 1533.24 | 
po 047.3] 69.24 | 948.91 | 1879.22 | 149.5] 76.92 | 512.55 | 1533.24 | 
| 18_—*| 147.8| 74.01 | 490.24 | 1545.77] 2 | 148.2| 70.28 | 835.08 | 1883.84 | 147.8] 74.45 | 490.29 | 1545.61 | 
PCT CT 150.9] 67.96 | 942.47 | 1881.45 [147.8] 74.45 | 490.29 | 1545.61 | 
| 19 | 155.8| 73.52 | 490.23 | 1628.65] 2 | 158.8] 71.48 | 762.39 | 1904.47 | 155.8] 74.97 | 490.38 | 1628.65 | 
PT CT 2159.6] 69.05 | 958.56 | 1901.31 [155.8] 74.97 | 490.38 | 1628.65 | 
| 22 (| 156.9] 75.99 | 594.35 | 1381.16] 2 | 158.9] 71.1 | 926.21 | 1706.66 | 156.9] 75.82 | 594.28 | 1381.03 | 
po | 160.8] 69.45 | 948.46 | 1859.55 | 156.9] 75.82 _| 594.28 | 1381.03 | 
po 152.9] 70.44 | 958.34 | 1889.63 | 152.3] 77.01 | 636.17 | 1570.8 | 
| 25 | 162.7|_ 70.73 | 604.33 | 1427.35] 2 | 1585] 69.4 | 891.76 | 1818.15 | 162.7| 75.13 | 604.34 | 1427.38 | 
PCT CT 958.4 | 67.65 | 961.8 | 1899.15 | 162.7] 75.13 | 604.34 | 1427.38 | 
| 390 167.4] 68.1 | 510.9 | 159472] 2 | 168.5] 72.68 | 792.14 | 1924.11 | 167.4] 76.37 | 510.86 | 1594.7 | 
PE CT 170.4 | 69.34 | 948.37 | 1910.28 | 167.4] 76.37_| 510.86 | 1594.7 | 
| 33 174.4] 71.55 | 488.7 | 1666.96] 2 [171.3| 69.82 | 745.22 | 1920.43 [174.4] 75.45 | 488.63 | 1666.94 | 
po 973.2] 68.22_ | 955.41 | 1923.52 | 174.4] 75.45 | 488.63 | 1666.94 | 
| 34__—{ 180.5] 70.55 | 565.94 | 1693.76] 2 | 177.9| 73.44 | 823.11 | 1872.57 | 180.5| 76.15 | 566.01 | 1693.81 | 
PT CT 179.5] 70.16 | 967.44 | 1902.78 | 180.5] 76.15 | 566.01 | 1693.81 | 
527.21 | 1691.35] 2 | 161.9| 72.02 | 796.71 | 1918.5 | 158.9] 75.86 | 527.19 | 1691.35 | 
PTC 262.1 69.73 | 962.87 | 1902.35 [158.9] 75.86 | 527.19 | 1691.35 | 
| 36__|_164 | 69.78 | 503.08 | 1657.27] 2 | 164.6| 71.64 | 791.25 | 1877.79 | 164 | 76.29 | 503.06 | 1657.26 | 
po | 166.7] 69.02 | 955.9 | 1895.07 | 164 | 76.29 | 503.06 | 1657.26 | 
| 38_—([161.6| 70.4 | 563.71 | 1386.34] 2 | 164.6| 71.55 | 876.93 | 1810.85 | 161.6| 75.99 | 563.75 | 1386.35 | 
po 172.2] 68.96 _| 937.22 | 1874.5 | 161.6] 75.99 | 563.75 | 1386.35 | 
| 39 178.9] 71.7_| 521.41 | 1576.03] 2 | 184.9] 72.71 | 773.82 | 1848.37 | 178.9] 75.15 | 521.41 | 1576.07 | 
PCT CT 182.4] 70.25 | 954.68 | 1893.68 [178.9] 75.15 | 521.41 | 1576.07 | 
ee Cenc 
een ee ee | | 12 | 167.9] 70.89 | 947.34 | 1880.75 | 164.9] 76.29 | 523.13 | 1539.26 | 
asa fas a ne [a 
pC 202.971 | 861.21 | 1668.58 | 200.8] 75.87 | 465.11 | 1335.08 | 
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6.8.1. SPECTROGRAPHIC ANALYSIS: VOWEL ANALYSIS 


This subsection concentrates on the analysis of the five sentences involving three vowel 
forms (eG, W, ©), as well as the contrast between the matras f and <t (/ and 7), and « and 


2 (U and wu). It has been found that the formant frequencies of synthesized speech 


sentences are approximately same as those of the original sentences. However, the 
formant frequencies have been increased by a large magnitude for the residual signals. In 
case of p = 12 (LPC), the formant frequencies for the residual signal are greater than the 
formant frequencies for the residual signal for p = 2, as expected. Resulting table is 
shown below (Table 6.2). Pitch and intensity variations are also shown in Table 6.3. 


Table 6.2: Vowel Analysis (First two formants: 5 sentences) 





































































































Sr. Sentence No. of poles | Formant Formant 
No. I Il 
1/1 Ge G3! Guet fag utp Gast ez! Original 543.12 1423.9 
Synthesized Signal 2 543.15 1423.99 
12 543.15 1423.99 
Residual Signal 2 813.35 1867.02 
12 932.04 1885.8 
2/2 wHa! Wt ag, wig Mea vies S ur] Original 630.37 1491.01 
Synthesized Signal 2 630.39 1491.04 
12 630.39 1491.04 
Residual Signal 2 915.34 1773.86 
12 966.24 1880.12 
3/3 tnd we flea, radar et 28 four Original 529.88 1624.99 
Synthesized Signal 2 529.85 1624.97 
12 529.85 1624.97 
Residual Signal 2 812.62 1839.09 
12 958.37 1895.58 
4/24 | fager Bl, AT Ud! Original 636.17 1570.86 
Synthesized Signal 2 636.17 1570.8 
12 636.17 1570.8 
Residual Signal 2 868.75 1808.57 
12 958.34 1889.63 
53/25 | GtHg, Ho’ UT! Original 604.33 1427.35 
Synthesized Signal 2 604.34 1427.38 
12 604.34 1427.38 
Residual Signal 2 891.76 1818.15 
12 961.8 1899.15 
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Table 6.3: Vowel Analysis (Pitch and Intensity: 5 sentences) 



















































































Sr. Sentence No. of poles Pitch Intensity 

1/1 Ge Os! Guet fag vio Gast ez! Original 166.54 71.28 
Synthesized Signal 2 166.53 74.28 

12 166.53 74.28 

Residual Signal 2 157.94 73.55 
12 157.06 69.64 

2/2 wHa! wag, wig nos whe & nr} Original 145.19 70.96 
Synthesized Signal 2 145.18 75.86 

12 145.18 75.86 

Residual Signal 2 141.69 71.01 

12 138.13 68.68 

3/3 tnd we flea, fade fiet 28 fore] Original 154.96 69.16 
Synthesized Signal 2 154.95 73.54 

12 154.95 73.54 

Residual Signal 2 146.4 69.5 

12 146.87 68.27 

4/24 | fear Br, AS Utd | Original 152.26 69.85 
Synthesized Signal 2 152.26 77.01 

12 152.26 77.01 

Residual Signal 2 152.8 73.06 

12 152.91 70.44 

5/25 | @f Ad, Ad -U IT! Original 162.65 70.73 
Synthesized Signal 2 162.65 75.13 

12 162.65 75.13 

Residual Signal 2 158.5 69.4 

12 158.36 67.65 




















It has been found that 

a) Pitch of the synthesized speech signal is approximately same as that of the 
original speech signal both in case of 2 poles and 12 poles. 

b) Pitch of the residual speech signal is slightly decreased as compared to the 
original speech signal. 

c) Also for the residual signal, pitch is slightly more with 2 poles than that of 12 
poles. 

d) Intensity of the residual speech signal with 2 poles is slightly increased, and with 
12 poles is decreased as compared to the original speech signal (except # 24). 
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6.8.2. SPECTROGRAPHIC ANALYSIS: NASAL ANALYSIS 


This subsection concentrates on the analysis of the five sentences involving five nasals 
(@, 2, &, ©, H). It has been found that the formant frequencies of synthesized speech 


sentences are approximately same as those of the original sentences. However, the 
formant frequencies have been increased by a large magnitude for the residual signals. In 
case of p = 12 (LPC), the formant frequencies for the residual signal are greater than the 
formant frequencies for the residual signal for p = 2, as expected. Resulting table is 
shown below (Table 6.4). Pitch and intensity variations are also shown in Table 6.5. 


Table 6.4: Nasal analysis (First two formants: 5 sentences) 





































































































No. Sentence No. of poles_| FormantI | Formant I 
Wl | SP attsatdsa BH CaSAI Original 475.43 1554.53 
Synthesized Signal 2 475.41 1554.51 
12 475.41 1554.51 
Residual Signal 2 795.26 1892.32 
12 942.24 1887.42 
2/12 | yy St Ht, HHT HT, HT HAG, AS Te] Original 512.55 1533.26 
Synthesized Signal 2 512.55 1533.24 
12 512.55 1533.24 
Residual Signal 2 800.06 1830.18 
12 948.91 1879.22 
Synthesized Signal 2 510.86 1594.7 
12 510.86 1594.7 
Residual Signal 2 792.14 1924.11 
12 948.37 1910.28 
Synthesized Signal 2 566.01 1693.81 
12 566.01 1693.81 
Residual Signal 2 823.11 1872.57 
12 967.44 1902.78 
Synthesized Signal 2 527.19 1691.35 
12 527.19 1691.35 
Residual Signal 2 796.71 1918.5 
12 962.87 1902.35 
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Table 6.5: Nasal analysis (Pitch and Intensity: 5 sentences) 































































































No. | Sentence No. of poles Pitch Intensity 

VW1l | SP dtsatSa BH CaSAI Original 157.06 72.62 
Synthesized Signal 2 157.06 76.06 
12 157.06 76.06 

Residual Signal 2 158.12 71.99 
12 159.2 70.37 

Synthesized Signal 2 149.51 76.92 
12 149.51 76.92 

Residual Signal 2 148.6 71.27 
12 147.33 69.24 

Synthesized Signal es 167.38 76.37 
12 167.38 76.37 

Residual Signal 2 168.45 72.68 
12 170.39 69.34 

Synthesized Signal 2 180.5 76.15 
12 180.5 76.15 

Residual Signal 2 177.85 73.44 
12 179.51 70.16 

Synthesized Signal 2 158.87 75.86 
12 158.87 75.86 

Residual Signal 2 161.85 72.02 
12 162.1 69.73 





It has been found that 





a) Pitch of the synthesized speech signal is approximately same as that of the signal 
both in case of 2 poles and 12 poles. 


b) In two cases, pitch of the residual sound signal is decreased (H, ) as compared to 


original sound signal and in other three cases (©, 8, ), it increases. 


c) Intensity of synthesized sound signal is increased as compared to original sound 
signal both in case of 2 poles and 12 poles. 


d) Intensity of the residual sound signals for two nasals (©, H) decreases, and for 


other three nasals (8, @, €) it increases as compared to the original sound signal. 
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6.8.3. SPECTROGRAPHIC ANALYSIS: TONEME ANALYSIS 


This subsection concentrates on the analysis of the six sentences involving five tonemes 


CY, ¥, v, U, 3). It has been found that the formant frequencies of synthesized speech 


sentences are approximately same as those of the original sentences. However, the 
formant frequencies have been increased by a large magnitude for the residual signals. In 
case of p = 12 (LPC), the formant frequencies for the residual signal are greater than the 
formant frequencies for the residual signal for p = 2, as expected. Resulting table is 
shown below (Table 6.6). Pitch and intensity variations are also shown in Table 6.7. 


Table 6.6: Toneme Analysis (First two formants: 6 sentences) 





































































































Sr. Sentence No. of poles | Formant Formant 
No. I I 
1/6 Sfumrs thd wimg Stet or far} Original 558.27 1438.82 
Synthesized Signal 2 558.26 1438.84 
12 558.26 1438.84 
Residual Signal 2 862.04 1769.25 
12 946.20 1867.03 
2/9 Ud 2 dein o de dfenr, feud ce age! Original 537.18 1588.24 
Synthesized Signal 2 537.31 1588.11 
12 537.31 1588.11 
Residual Signal 2 834.06 1866.62 
12 966.5 1898.47 
3/22 | wary sg ag so Hue, sing wan? Original 594.35 1381.16 
fst II 
Synthesized Signal 2 594.28 1381.03 
12 594.28 1381.03 
Residual Signal 2 926.21 1706.66 
12 948.46 1859.55 
4/33 | ag orfter, ge vate dey Original 488.7 1666.96 
Synthesized Signal 2 488.63 1666.94 
12 488.63 1666.94 
Residual Signal 2 745.22 1920.43 
12 955.41 1923.52 
5/6 | Ge eudtus ad 3d, ao fa ae 5 TTI Original 503.08 1657.27 
Synthesized Signal 2 503.06 1657.26 
12 503.06 1657.26 
Residual Signal 2 791.25 1877.79 
12 955.9 1895.07 
Synthesized Signal 2 563.75 1386.35 
12 563.75 1386.35 
Residual Signal 2 876.93 1810.85 
12 937.22 1874.5 
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Table 6.7: Toneme Analysis (Pitch and Intensity: 6 sentences) 


































































































No. Sentence No. of poles | Pitch Intensity 
1/6 Sfumrs Pd whims ews ar mr] Original 140.25 73.79 
Synthesized Signal 2 140.25 75.98 
12 140.25 75.98 
Residual Signal 2 140.66 71.87 
12 137.48 69.91 
2/9 Ud 2 deni o de fer, feud ce aged! Original 161.34 74.14 
Synthesized Signal 2 161.34 75.59 
12 161.34 75.59 
Residual Signal 2 168.56 72.93 
12 171.45 69.58 
3/22 wat HRY sd sg ST Hus, sing wan 2 fitz Original 156.86 75.99 
Synthesized Signal 2 156.86 75.82 
12 156.86 75.82 
Residual Signal 2 158.86 71.1 
12 160.79 69.45 
Synthesized Signal 2 174.43 75.45 
12 174.43 75.45 
Residual Signal 2 171.31 69.82 
12 173.2 68.22 
536 | tend us ga 3d, au Su aed asl Original 164.03 69.78 
Synthesized Signal 2 164.03 76.29 
12 164.03 76.29 
Residual Signal 2 164.56 71.64 
12 166.74 69.02 
Synthesized Signal 2 161.56 75.99 
12 161.56 75.99 
Residual Signal 2 164.59 71.55 
12 171.2 68.96 




















It has been found that 

a) Pitch of the synthesized speech signal is approximately same as that of the 
original speech signal both in case of 2 poles and 12 poles. 

b) Pitch of residual speech signal is increased as compared to original speech signal. 

c) Intensity of the synthesized speech signal is increased as compared to the original 
speech signal both in case of 2 poles and 12 poles. 

d) Intensity of the residual speech signal is decreased both in 2 poles and 12 poles as 
compared to the original speech signal. Also, Intensity for p = 12 is lower than 
that for p = 2. 
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6.8.4. SPECTROGRAPHIC ANALYSIS: SPECIAL SENTENCES 


This subsection concentrates on the analysis of the fen special sentences involving the 
special phonemes (8, 5, 3, &, %, 3, U, =, J, J), as well as the examples of the extended 


pronunciation. It has been found that the formant frequencies of synthesized speech 
sentences are approximately same as those of the original sentences. However, the 
formant frequencies have been increased by a large magnitude for the residual signals. In 
case of p = 12 (LPC), the formant frequencies for the residual signal are greater than the 
formant frequencies for the residual signal for p = 2, as expected. Resulting table is 
shown below (Table 6.8). Pitch and intensity variations are also shown in Table 6.9. 


Table 6.8: Special Sentences Analysis (First two formants: 10 sentences) 


































































































Sr. Sentence No. of poles | Formant Formant 
No. I Il 
Synthesized Signal 2 622.17 1482.81 
12 622.17 1482.81 
Residual Signal py 887.84 1824.15 
12 956.8 1886.91 
2/7 fgets gus Dastest2deeal Original 530.95 1785.74 
Synthesized Signal 2 530.94 1785.71 
12 530.94 1785.71 
Residual Signal 2 748.45 1928.29 
12 958.22 1912.12 
3/8 Sd-OT Bs, SATA Saas al Original 566.71 1484.24 
Synthesized Signal 2 566.68 1484.24 
12 566.68 1484.24 
Residual Signal 2 848.51 1829.93 
12 938.81 1896.69 
4/10 3am, J 3303 de aaa Sass fit aa! Original 561.35 1419.51 
Synthesized Signal 2 561.4 1419.77 
12 561.4 1419.77 
Residual Signal 2 869.22 1786.6 
12 938.62 1864.63 
J/18 | as Hes aeet, HS HS ure Original 490.24 1545.77 
Synthesized Signal 2 490.29 1545.61 
12 490.29 1545.61 
Residual Signal 2 835.08 1883.84 
12 942.47 1881.45 
Synthesized Signal 2 490.38 1628.65 
12 490.38 1628.65 
Residual Signal 2 762.39 1904.47 
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12 958.56 1901.31 

7/22 | wat HRY sd sg So Hue, sing wane Original 594.35 1381.16 
fH3T Il 

Synthesized Signal 2 594.28 1381.03 

12 594.28 1381.03 

Residual Signal 2 926.21 1706.66 

12 948.46 1859.55 

8/9 | mwopEeduas za, uss $ u@ sga"ll Original 521.41 1576.03 

Synthesized Signal 2 521.41 1576.07 

12 521.41 1576.07 

Residual Signal 2 773.82 1848.37 

12 954.68 1893.68 

Synthesized Signal 2 523.13 1539.26 

12 523.13 1539.26 

Residual Signal 2 805.05 1855.05 

12 947.34 1880.75 

10/83 | dye gagegat da ygez! Original 496.08 1414.82 

Synthesized Signal 2 496.08 1414.82 

12 496.08 1414.82 

Residual Signal fi 863.04 1834.9 

12 941.66 1903.78 





Table 6.9: Special Sentences Analysis (Pitch and Intensity: 10 sentences) 







































































Sr. Sentence No. of poles Pitch Intensity 
WA | wound SHS aw aus se ffs! Preial Looe a 
Synthesized Signal 2 133.4 73.28 
12 133.4 73.28 
Residual Signal 2 135.19 71.03 
12 134.62 69.34 
2/7 feed dus Dati asiadecdl Original 158.76 66.47 
Synthesized Signal 2 158.76 74.35 
12 158.76 74.35 
Residual Signal 2 161.32 70.9 
12 161.73 66.26 
Synthesized Signal 2 158.74 73.77 
12 158.74 73.77 
Residual Signal 2 161 71.3 
12 160.39 70.05 
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4/10 3am, J 33038 de aaa Sasr fio aa] Original 162.48 73.32 

Synthesized Signal 2 162.48 74.42 

12 162.48 74.42 

Residual Signal 2 163.09 72.25 

12 157.15 69.39 

Synthesized Signal 2 147.75 74.45 

12 147.75 74.45 

Residual Signal 2 148.15 70.28 

12 150.92 67.96 

Synthesized Signal 2 155.8 74.97 

12 155.8 74.97 

Residual Signal 2 158.79 71.48 

12 159.63 69.05 

Synthesized Signal 2 156.86 75.82 

12 156.86 75.82 

Residual Signal 2 158.86 711 

12 160.79 69.45 

Synthesized Signal 2 178.87 75.15 

12 178.87 75.15 

Residual Signal 2 184.94 72.71 

12 182.43 70.25 

Synthesized Signal 2 164.94 76.29 

12 164.94 76.29 

Residual Signal 2 165.13 72.7 

12 167.9 70.89 

10/83 | HE gat egat da ase! Original 200.8 68.35 

Synthesized Signal 2 200.8 75.87 

12 200.8 75.87 

Residual Signal 2 202.5 73.61 
12 202.93 71 





It has been found that 


a) 


b) 
C) 


d) 


Pitch of synthesized sound signal is approximately same as that of original sound 


signal both in case of 2 poles and 12 poles. 


Pitch of residual sound signal is increased as compared to original sound signal. 


Also for residual sound signal, pitch is less with 2 poles than 12 poles in most of 


cases. 


Intensity of synthesized sound signal is increased as compared to original sound 


signal both in case of 2 poles and 12 poles. 
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e) Intensity of residual sound signal with 2 poles is increased and with 12 poles is 
decreased as compared to original sound signal. 


6.9. SPECTROGRAPHIC ANALYSIS (PRAAT) 


The next 10 snapshots have been generated by using Praat. In all the 4-colored 
snapshots given below: 

Blue color represents Pitch (Range: 75 — 500 Hz), 

Black color represents Spectrograms (Range: 50 — 400 db), 

Yellow color represents /ntensity (Range: 0 — 4000 Hz), and 

Red color represents Formant I to Formant IV (Range: 0 — 4000 Hz). 

These snapshots are presented in the following 10 Figures (Fig. 6.1-Fig. 6.10). In the 
FFT figures with LPC smoothing, FFT is represented in Black color, and LPC smoothing 
is colored (Green color or Red color) 

a) Fig.6.1-6.2: Original sentence (4 colored items described above); FFT with LPC 
smoothing 

b) Fig. 6.3-6.4: Residual p = 2 (4 colored items); FFT with LPC smoothing 

c) Fig. 6.5-6.6: Residual p = 12 (4 colored items); FFT with LPC smoothing 

d) Fig. 6.7-6.8: Synthesized, p=2 (4 colored items); FFT with LPC smoothing 

e) Fig. 6.9-6.10: Synthesized p = 12 (4 colored items); FFT with LPC smoothing 


These snapshots analyze the sentence (# 11) representing the nasal phoneme [n] = [0] 
transcribed below in the Punjabi language (Gurmukhi script), IPA, and PUNJARPAbet: 
SA sae sa SS eQSA! 
nimmo di nti de nokk ’te n3 téke logge. 


/N THh MM OW - DIY - N UWnh - DEY - N AH KK -’T EY - N AOn - 
T AHn K EY - L AH GG EY/ 
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Figure 6.1: Sentence # 11 (Original waveform, Pitch, Spectrogram, Intensity, Formants) 
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Figure 6.3: Sentence 11 (Residual Signal 2 poles, Pitch, Spectrogram, Intensity, Formants) 
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Figure 6.5: Sentence 11 (Residual Signal 12 poles, Pitch, Spectrogram, Intensity, Formants) 
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Figure 6.7: Sentence 11 (Synthesized Signal 2 poles, Pitch, Spectrogram, Intensity, Formants) 
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Figure 6.9: Sentence 11 (Synthesized Signal 12 poles, Pitch, Spectrogram, Intensity, Formants) 
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Figure 6.10: Sentence 11 (Synthesized Signal 12 poles: FFT and LPC Smoothing) 


6.10. SPECTROGRAPHIC ANALYSIS (MATLAB) 


The set of next five figures (Fig. 6.11-Fig.6.15), consisting of the speech waveforms, 
Pitch contours, V/UV decision, and FFT, is generated by the first MATLAB program. 
These figures concentrate on sentence # 1. This sentence is representing the vowel form 


[8] transcribed below in Gurmukhi script, IPA, and PUNJARPAbet: 


Ge Gs! Gue fag wl? Cast ez! 

oe Ullu! igda kI6 #? Uggli phar! 

/OW EY - UH LL UW! UWn hG D AA - K TH OWn - AEn? 
UHn GG LIY - PH AH RH!/ 


In Fig. 6.11, we can see that the intensity line is absent in the pitch contour for the 
unvoiced regions, whereas the intensity line is present (proportional to the loudness) for 


the voiced regions (represented by rectangles in one of the sub-plots in Fig. 6.11) as 
expected. 
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Original Speech Signal, File=1.wav 
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Figure 6.11: Pitch analysis sentence # 1 (original waveform, pitch contour, V/UV regions) 
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Figure 6.12: Sentence # 1 Waveforms (Original, Residual: 2 poles, Synthesized: 2 poles) 
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Single-Sided Amplitude Spectrum, Original Signal, 1.wav 
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Figure 6.13: Sentence # 1 Amplitude Spectrum (Original, Residual: 2 poles, Synthesized: 2 poles) 
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Figure 6.14: Sentence # 1 Waveforms (Original, Residual: 12 poles, Synthesized: 12 poles) 
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Single-Sided Amplitude Spectrum, Original Signal, 1.wav 
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Figure 6.15: Sentence # 1 Amplitude Spectrum (Original, Residual: 12 poles, Synthesized: 12 poles) 


The next set of two figures (Fig. 6.16 and Fig. 6.17) is generated by the second 
MATLAB program. These figures concentrate on the first 1024 samples (to get a better 


insight of the FFT) of sentence # 3. This sentence is representing the vowel form [®] 
transcribed below in Gurmukhi script, IPA, and PUNJARPAbet: 


thad 13 fied, fear ket FS fanrG| 


isor ote Idor, Ikvaja Itta ethe lao. 
/TY SH AHR - AHT EY - IHn D AHR, JH K V AHn J AA - IH T:T: AAn - 
EY TH EY - LIH AA OW/ 


When we closely examine and compare the figures in this section (e.g., Fig. 6.16 and Fig. 
6.17), we can see that the synthesized speech waveform FFT (for p = 12) is closer to the 
original speech waveform FFT than the synthesized speech waveform FFT (for p = 2). 
This is expected because the intelligibility of the synthesized speech for p = 12 is much 
closer to the original waveform (and much superior to that of the p = 2) than the 
intelligibility of the synthesized speech for p = 2. 
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FFT: original speech 3.wav 2 Poles 
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Figure 6.16: FFT (Original Sentence # 3 and Synthesized Speech 2 poles) 
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Figure 6.17: FFT (Original Sentence # 3 and Synthesized Speech 12 poles) 
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6.11. EVALUATION OF PUNJARPAbet BY USING CORPUS 


While designing the new phonetic alphabet PUNJARPAbet in Chapter I, new 
symbols have been designed for the following 16 Punjabi speech sounds: 


COIDIWeBVSEOCZeaga4gg37 
kh, x, g, k/g, ch, cj, n/ hi, th, t/d,n,t,t/d, ph, p/b,1,r 


The differentiation between 4/H, dI/sT is quite blurry even in the Punjabi language. The 


reason is that the five (H, 4, SI, A, S) out of the six ‘dot (bindi) in the foot letters’ in the 


antim toli were introduced in the Gurmukhi script in the previous century to represent the 
foreign sounds belonging to the English, Persian and the Arabic language. Therefore, the 


analysis concentrates only on the remaining thirteen letters: YW, &. ¥, 2, 6, v, =, 3, U, g, 


3, &%, Z. These 13 letters can be classified into three distinct categories for the purpose of 


this evaluation: 


1. Tonemes: W, ¥, 2, 4, J 
2. Nasals: 2, = 
3. Special Sounds: &, 6, 3, ¢, 3, F 


The new phonetic coding scheme PUNJARPAbet developed in Chapter II has 
been primarily designed to encode the text and speech corpora developed in Chapter IV. 
This corpus has at least twenty special features. PUNJARPAbet is capable of coding 
examples related to all special features of the new corpus. PUNJARPAbet is an all 
upper-case coding scheme and is consistent with the all upper-case version of the famous 
coding scheme ARPAbet developed by the Advanced Research Projects Agency 
(ARPA). The new scheme is very easy to follow, and is the most suitable scheme for 
typing on an ordinary computer keyboard as well as an ordinary typewriter. Unlike many 
other schemes such as the famous International Phonetic Alphabet (IPA), the 
PUNJARPAbet is free from the laborious, irritating, and time-consuming necessities for 
dealing with the special symbols. It has been clearly demonstrated in the following 
sections that PUNJARPAbet is a versatile, efficient and convenient coding scheme for 
not only the Punjabi language, but also any language which has sounds similar to the 
ones found in the Punjabi speech. 


Even though the corpus consists of at least twenty original features, only three of 
these features are described and exemplified in this section using PUNJARPAbet so as to 
demonstrate that the newly designed PUNJARPAbet is versatile enough to handle a wide 
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variety of issues related to the representation of the text and speech corpus in the Punjabi 
language and Gurmukhi script. 


6.11.1. Examples of uniqueness 


Each item in the corpus is unique. Each sentence/boli in the corpus has been 
carefully designed so as to convey something new and special about the Punjabi 
language, Gurmukhi script, Malwai dialect, phonetics, matras, vowels, and consonants by 
using a wide variety of vocabulary, idioms and expressions. In particular, a special 
feature called Anupras Alankar (WOH Ward) has been used throughout the 
development of the corpus. Anupras means “alliteration”. According to Wikipedia: “In 
language, alliteration is the repetition of a particular sound in the prominent lifts (or 
stressed syllables) of a series of words or phrases.” Consequently, the Anupras Alankar 
means the repetitious usage of a word or a letter to enhance the beauty of a piece of 
literature. In Indian languages and literature, the Anupras Alankar has been used for 
centuries. The idea behind the usage of the Anupras Alankar in this work is that the 
repetitious use of a letter will enable the user to thoroughly analyze and synthesize any 
particular phoneme in a single item. Two examples from the new corpus (predominantly 


making use of the letters W and © respectively) to illustrate Anupras Alankar using the 
PUNJARPAbet are given below: 


Faet Ht AE, FAS St ATS SI 

jogdi modgdi jago, jltt li jito ne. 

/J AHGDIY -M AHhGDIY -J AAGOW, JIH TT-LIY-JIY TOW - 
N EY/ 


CdS tI a, te Hae Wal 
dorshon dé dipo, de de sdk de gere. 
/D AHR SH AHN-DEYh- DIY POW - DEY - DEY - SH AOn K - DEY - 
GWY RHEY./ 
6.11.2. Examples of tones 


The following examples illustrate the occurrence of each of the five tonemes (4, 


8, v, U, 3) in the initial, middle (medial), as well as the final positions (words in bold 


below) in a single item: 
we S Hy He HS Ay oT or ot fer 
kUdde ne khdgure mar mar sdg da kdga kora Ila. 


/K1 UH DD EY - N EY - KH AHn hG UW R EY -M AAR-M AAR- 
S AHn hG - D AA - K AHn hG AA - K AHR AA - LIH AA/ 
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H3S! HY, Sat & Baa Seu] oa | 


mdjelo! mdjj cotti da cdgra cat pot nabero. 
/M AH hJ AE L OW! M AH bJJ, CHI OW T:T: TY - D AA — 
CHI AH G RH AA - CHI AH T:T: - P AH T:T: - N AH B EY RH OW/ 


Ud 2 ger o sa cfr, feast cc ag Ui! 

tagge de ciidia na vadd kUddla, tIbori tet kor dii! 

/T:1 AH GG EY - DEY - CH UWn hD: TY AAn - N AA - V AH hD:D: - 
K UHn hD: JH AA, T:1 IHn B AHRTY - T: AE T: - K AHR - D: UWn!/ 


TIS-IM S, Sfoor sas, ATS" | 

parot-pumi nu, lobla paget, soraba. 

/P1 AA R AH T -- PLUW MITY - N UWn, L AH hBIY AA - PLAH G AHT, 
S AHR AA bB AA./ 


fad viasint sur 3 ssirib, fad fisdist sehr 


kIvé cabolia pua te patijia, kIvé mI] g6ba bonia. 
/K TH V EYn - CH AAn hB AH LIY AAn - PI UW AA - T EY - 
Pl AA TTY JTY AAn, K IH V EYn- M IH LG OW bB AA - B AHN: TY AAn./ 


These items have been especially designed in the corpus to illustrate the five tonemes 
(voiced aspirates: 4, ¥, 3, U, 3) occurring at the various positions in the words. If the 


tonemes occur in the initial position of words and stressed syllables, then the tonemes 
represent the unaspirated voiceless (A, J, ¢, 3, U) followed by a low tone. For the non- 


initial (middle/medial or final) positions of these tonemes, there is a high tone on the 
vowel before the voiced unaspirated (SI, A, 3, <, 4), or a low tone on the vowel 


following the voiced unaspirated (Al, q, 3, <, 4). Twenty-nine words from the six items 


used in this chapter exemplifying the use of tonemes as explained here are tabulated in 
Table 6.10 along with their IPA representation. 


6.11.3. Examples of extended pronunciation (@Hae" Gade) 


Examine the pair of words italicized and in bold in the following items (sentences 
and bolis) from the corpus: 


TAH STAs doa JITAT fafsr| 
hakom ne hds nui horas ke hora hira jlttla. 


/H AA K AHM-N EY -H AHnS - N UWn- H AHR AA AH - KEY - 
H AHR AA-HIY RAA-J TH TT IH AA/ 
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Table 6.10: Illustration of tonemes (Punjabi and IPA) 


Initial 


yc 
kUdda 


Medial 
duygr 


khdgura 


sdg da k3ga 





Bcl, yaa, S| 


cotti, cdgra, cat 


He 


m9jj 





dar, fact 


tagga, tIbori 


ciidia, kUddla 


ze 
vodd 





tutdd, taroma 


ugg 
poddora 


padi, pdd, vadd 





TIS, Ht, SAS 


paroat, pumi, paget 


Star 
lébla 


AWS 


soraba 








pua, patijia 


= 


cabalia 











fisdis 
mllgoba 


WAITHY SJ J ST HUT, sit ware fis ll 

kdra mag pdr parti tera m4ggora, cajra karao de mlttora. 

/K1 AH RH AA - M AA hG - PI AH R - P1 AH R UWn - T EY R AA - 
M AH hGG AHR AA, CHI AAn J R AAn - KI AH RH AA AH - DEY - 
M IH TT AHR AA/ 


IH! ga FI MSN 

pora ji! phuk paras li? 

/P1 AH R AA - J TY! PH UW K - Pl AHR AA AH - LIY?/ 
fie ufewa d yo Head HeaTstont| 

mido mUtlar ne mt motkas ke motka pdnnla. 


/M IHn D OW - M UH T: IH AA R - N EY - M UWnh - M AH T: K AA AH - 
K EY- M AHT: K AA - Pl AH NN IH AAn/ 


HoH SS! AH" Haw fam | 

sorma vala sorma sormao gla. 

/SH AHRM AAn- V AAL AA - SH AHR M AA - SH AHR M AA AH - 
G IH AA/ 


Several items in the corpus have been designed to demonstrate that in these pairs of 
words, the meaning of the word changes with the addition of // at the end of the word, 


and its pronunciation is also extended. The six pairs of words (from the examples used in 
this chapter) with the changed meanings are tabulated in Table 6.11 along with their 
meanings and the IPA representation. 
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Table 6.11: Illustration of Extended Pronunciation 











Meaning Meaning 

1 go hora green sayy horaa to defeat 
2 war kara pitcher wamy kérao_ | to chisel into 
some design 

3 3g pora brother sayy porao to fill (air) 





4 Heat | motka pitcher Heamy | motkas to gesture 
with eyes 
5 HoH | sorma Sharma HoH | sormao to feel shy 
(a last name) 
6 wyazar | khorka noise wyazamy | khorkao to knock 





























6.12. GRAPHICAL EVALUATION OF PUNJARPAbet 


One of the main objectives of this research was the development of a new 
phonetic alphabet consistent with the ARPAbet for the phonetic transcription of the 
Punjabi language (objective 1). It can be an equally useful tool for natural language 
processing as well as digital speech processing in the Punjabi language. One of the 
critical issues we addressed in this research is to come up with balancing the 
requirements of speech processing and that of linguistics characteristics. In case of 
speech processing, researchers consider that the quality of data plays dominant role in 
comparison with the quantity of data. In this section, in order to evaluate the performance 
of the new coding scheme PUNJARPAbet, we have used a number of different 
evaluation criteria. In this work, we captured the speech signal using head-worn 
microphone. Prior to the experimental evaluations, we ensured that all speech material 
was Clearly understood by the participants. We used a variety of categories of people 
having different levels of schooling. We used five people in each category of different 
level of schooling. We provided sufficient time to each participant to become familiar 
with the experimental equipment. The evaluation results in four evaluation areas have 
been described below. 


6.12.1. Speaking and Typing Rates 


We observed speaking and typing rates of different categories of participants 
(speakers-cum-typists) for various items in the corpus using the spontaneous Punjabi 
language. Various categories of people participating in the evaluation included five 
different levels of schooling: ordinary uneducated, elementary/primary school, high 
school, university level, and post graduate. We define the speaking and typing rates as 
number of words spoken or typed per five seconds. We selected a predefined set of ten 
words and requested each participant to record and type. We observed an average of 4.8 
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words typed and 7.2 words spoken. It was also observed that at higher level of education, 
higher was the number of words spoken or typed. Figure 6.18 illustrates the comparison 
of speaking and typing rates across different educational levels. 





10 Comparison of speaking and typing rates 
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Figure 6.18: Comparison of speaking and typing rates 


6.12.2. Typing Rates of Different Coding Schemes 


We also compared the tying rates of selected items from the new corpus using three 
different alphabets. We compared the tying rates using PUNJARPAbet with the typing 
rates of the same items using the existing approaches of IPA and ARPAbet. We 
observed that on the average, 4.8 words were typed in case of PUNJARPAbet, 3.2 words 
in case of IPA, and 3.8 average words were typed in case of ARPAbet. Figure 6.19 
illustrates the comparison of typing rates across different coding schemes and educational 
levels. As compared with IPA or ARPAbet, the participants consistently demonstrated 
better performance when PUNJARPAbet was employed. Whereas PUNJARPAbet was 
the clear winner (first), IPA was the /ast. The apparent reason is the fact that IPA 
requires special symbols whereas PUNJARPAbet does not. 


6.12.3. Uniqueness 


Vowel-dominant sentences included more vowel symbols (in the Punjabi language, 
these symbols are also known as matras or accessory signs or diacritical marks or vowel 
signs) aS compared with the consonant-dominant sentences. Sentences with a wide 
variety of vocabulary, idioms and expressions were used in this evaluation. We define the 
typing rates as number of vowels and consonants per five seconds typed. We selected a 
predefined set of ten sentences in each of the two categories: vowel-dominant and 
consonant-dominant sentences. We requested each participant to type these selected 
items. We observed that on the average, 2.6 words were typed in case of IPA, three 
words were typed in case of ARPAbet, and four average words were typed in case of 
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PUNJARPAbet for vowel-dominant sentences. Figure 6.20 illustrates the comparison of 
typing rates of vowel-dominant sentences across these three different coding schemes and 
educational levels. 





Comparison of typing rates of diffrerent coding schemes 
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Figure 6.19: Comparison of typing rates of different coding schemes 
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Figure 6.20: Comparison of typing rates of vowel-dominant sentences 


Figure 6.21 illustrates a similar comparison of typing rates for consonant-dominant 
sentences for various categories of participants. We observed that on the average, 3.4 
words were typed in case of IPA, five words were typed in case of ARPAbet, and six 
average words were typed in case of PUNJARPAbet for consonant-dominant sentences. 
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Comparison of typing rates of consonant-dominant sentences 
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Figure 6.21: Comparison of typing rates of consonant-dominant sentences 


The evaluation results in different evaluation areas graphically illustrated in four 
figures (Fig. 6.18-6.21) are summarized in this paragraph. The evaluation confirms the 
well-known fact that the speaking rate of a person is faster than the typing rate (Fig. 6. 
18). Figure 6.19 clearly confirms that the typing rates of various participants are much 
faster when PUNJARPAbet is used as the typing tool as compared with the traditional 
coding schemes (IPA or ARPAbet). The primary reason for PUNJARPAbet being faster 
is the fact that IPA requires special symbols for typing vowel symbols (also known as 
matras or accessory signs or diacritical marks or vowel signs). To further investigate the 
performance of the new coding scheme, we designed two more experiments. Whereas the 
words used in the first two experiments (see Fig. 6.18 and 6.19) are a general mixture of 
consonants and vowels, the sentences used in these two experiments are split in two 
categories: vowel-dominant and consonant-dominant sentences. Vowel-dominant 
sentences include more vowel symbols as compared with the consonant-dominant 
sentences. The typing rates for the consonant-dominant sentences (Fig. 6.21) are the 
fastest as compared with the general sentences (Fig. 6.19) and the vowel-dominant 
sentences (Fig. 6.20). The typing rates for the vowel-dominant sentences are the slowest. 
The average typing rates for general sentences, vowel-dominant sentences, and 
consonant-dominant sentences illustrated in Fig. 6.19, 6.20 and 6.21 respectively are 
further summarized in Table 6.12 to convincingly demonstrate that the PUNJARPAbet is 
the best in comparison to the other two schemes. PUNJARPAbet is comparatively better 
among all categories of speakers because it does not require the laborious, irritating, and 
time-consuming typing of the special symbols for vowels, nasalization, tones, and the 
diacritical marks. It is especially true in comparison with IPA where two diacritical marks 
over the ten vowel symbols are needed for the Punjabi language in Gurmukhi script. 
Further, PUNJARPAbet is also more efficient than ARPAbet because ARPAbet (single 
symbol version) requires frequent switching between lower-case and upper-case, and at 
present there are no symbols in the ARPAbet to represent at least 16 Punjabi speech 
sounds (see Table 2.7, Chapter ID) 
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Even though the new phonetic coding scheme PUNJARPAbet developed in this work has 
been designed to encode a new text and speech corpora developed in Chapter IV, it 
should be clear that PUNJARPAbet is a versatile, efficient and convenient coding 
scheme for any language which has sounds similar to the ones found in the Punjabi 
speech. 


Table 6.12: Summary of average typing rates 


Coding Scheme General Vowels Consonants 


IPA 3:2 2.6 3.4 
ARPAbet 3.8 3 5 
PUNJARPAbet 4.8 4 6 




















6.13. CONCLUSION 


The first part of this chapter is dedicated to the graphical analysis of the speech 
sentences synthesized in Chapter V. The software packages Praat and MATLAB have 
been extensively used to study the spectral characteristics of the synthetic speech along 
with the formant frequencies, pitch, and intensity. The quality of the synthesized speech 
for p = 12 was almost as intelligible as the original speech. As expected, the residual 
signals for p = 2 were much closer to the original sentence (because less information has 
been extracted out of it) than the residual signal for p = 12 (because more information has 
been extracted out of it). Even though 25 sentences (grouped into four distinct categories 
of vowels, nasals, tonemes, and special sentences) have been synthesized, only a limited 
number of representative graphs have been included in this thesis to keep the volume 
under control. Spectrographic analysis in this chapter also confirms the conclusions of 
Chapter V. In particular, the synthesized speech waveform FFT (for p = 12) is closer to 
the original speech waveform FFT than the synthesized speech waveform FFT (for p = 
2). This is expected because the intelligibility of the synthesized speech for p = 12 is 
much closer to the original waveform (and much superior to that of the p = 2) than the 
intelligibility of the synthesized speech for p = 2. In the graphs for the pitch analysis, we 
can see that the intensity line is absent in the pitch contour for the unvoiced regions, 
whereas the intensity line is present (proportional to the loudness) for the voiced regions 
as expected. Whenever Praat, and MATLAB were used to plot graphs for the same 
variables (e.g., waveforms, pitch, and FFT), both packages generated similar results as 
expected. 


The second part of this chapter is dedicated to the graphical evaluation of new 
phonetic alphabet (coding scheme) PUNJARPAbet designed in Chapter II. The evaluation 
results in four evaluation areas have been illustrated in the graphical analysis in four 
figures and one table in the evaluation section. Although PUNJARPAbet is capable of 
coding all examples related to all special features of the new corpus, yet only three 
examples areas (uniqueness, tonemes, and extended pronunciation) along with two tables 
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have been used to clearly demonstrate that PUNJARPAbet is better than the other 
phonetic alphabets e.g., IPA and ARPAbet. PUNJARPAbDet is an all upper-case coding 
scheme. It is consistent with the all upper-case version of the famous ARPAbet scheme. 
The new scheme is very easy to follow, and is the most suitable scheme for an ordinary 
computer keyboard as well as an ordinary typewriter. Unlike many other schemes (e.g., 
IPA), the PUNJARPAbet is faster and free from the laborious, irritating, and time- 
consuming necessities for dealing with the special symbols for vowels, nasalization, 
tones, and inserting diacritical marks (especially where two diacritical marks over the ten 
vowel signs are needed for the Punjabi language in Gurmukhi script). The graphical 
analysis summarized in figures and tables in this chapter clearly demonstrates that 
PUNJARPAbet is a versatile, efficient and convenient coding scheme not only for the 
Punjabi language, but also for any language which has sounds similar to the ones found 
in the Punjabi speech. 7 
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CHAPTER VII 
SUMMARY AND CONCLUSIONS 


This thesis describes the topic: Speech Analysis and Synthesis of the Punjabi 
Language. In this chapter, we summarize the whole work, highlighting the original 
contribution and main conclusions of this project, followed by the future directions. 


7.1. SUMMARY AND CONCLUSION 


The original contributions and conclusions of this research can be summarized as 
follows: 


7.1.1. Phonetic Alphabet Development 


The first issue in the present work is actual representation of the phonemes. In 
linguistics, the International Phonetic Alphabet (IPA) designed by International Phonetic 
Association (IPA) is used to represent the phonemes. However, the major limitation of 
this representation is that it needs some special symbols that are not readily available on 
computer keyboards [15, pp. 801]. To simplify this problem, ARPAbet representation 
came into existence as a result of the ARPA SUR project [15, pp. 527-529]. Since then, a 
number of coding schemes have been used in literature for international as well as Indian 
languages. The need for a new coding scheme becomes obvious when we investigate the 
existing coding schemes such as IPA, ARPAbet, ISCII, SAMPA (and its extended 
versions X-SAMPA and SAMPROSA), INSROT and wx-Roman. Most of these schemes 
prove unsuitable for the Punjabi language because most of these cannot be typed on a 
conventional typewriter or a computer keyboard; or most of these schemes are 
inconsistent (different books make use of the different symbols for the elements implying 
the possible difference in notation even within the IPA). The laborious, irritating, and 
time-consuming necessities for dealing with the special symbols for vowels, nasalization, 
tones, and inserting diacritical marks in most of these schemes (especially where two 
diacritical marks over the ten vowel signs are needed for the Punjabi language in 
Gurmukhi script) confirm the need for a new coding scheme such as the one designed in 
this work. In this work, a new phonetic alphabet consistent with the ARPAbet phonetic 
transcription of the Punjabi language has been developed. We have named the new 
alphabet as PUNJARPAbet by combining the words PUNJ + ARPAbet, the same way as 
the name PUNJAB has been derived by combining PUNJ + AAB. 


The idea of the new scheme originated to address the following four issues: 
1. The existing schemes (e.g., IPA) almost always require special symbols. It is not an 


easy task to find these symbols, or to combine the appropriate diacritical marks to 
these symbols. 
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2. While transferring data consisting of these symbols from one computer to another, it is 
not unusual to face the portability problems. Consequently, dealing with these symbols 
is laborious, irritating, and time-consuming. 


3. No two authors agree on the representation of the Punjabi speech sounds as shown by 
Table 2.6 consisting of Gill & Gleason [43], Bahri [13], and Dulai and Koul [34]. 


4. The symbols for at least 16 letters of the Punjabi language/Gurmukhi script do not 
exist in the ARPAbet. 


Table 2.7 summarizes the new coding scheme. Table 2.8 gives the complete 
template of the new coding scheme PUNJARPAbet. The new scheme is very easy to use 
since all the symbols used in this scheme are readily available on an ordinary computer 
keyboard as well as on an ordinary typewriter. Consequently, the PUNJARPAbet is the 
most suitable coding scheme for transcribing the Punjabi language text corpora. 
PUNJARPAbet is capable of coding examples related to all special features of the new 
corpus. 


PUNJARPAbet is an all upper-case coding scheme and is consistent with the all 
upper-case version of the famous coding scheme ARPAbet developed by the Advanced 
Research Projects Agency (ARPA). The new scheme is very easy to follow, and is the 
most suitable scheme for typing on an ordinary computer keyboard as well as an ordinary 
typewriter. Unlike many other schemes such as the famous International Phonetic 
Alphabet (IPA), the PUNJARPAbet is free from the laborious, irritating, and time- 
consuming necessities for dealing with the special symbols. By conducting the graphical 
evaluation, it has been clearly demonstrated in this work that PUNJARPAbet is a 
versatile, efficient and convenient coding scheme not only for the Punjabi language, but 
also for any language which has sounds similar to the ones found in the Punjabi speech. 


7.1.2. New Corpus Development 


A new text and speech corpus for the Punjabi speech processing has been 
developed in this work. This is the second most important contribution of this study. 
Speech sentences spoken by the adult male and female Punjabi speakers from ten districts 
of the Malwa region of the Punjab state of India were recorded for this corpus. This 
corpus consists of two parts: The first part consists of 52 sets of the Punjabi speech 
sentences. The second part of the corpus consists of 35 sets of single line folk songs 
(bolis). This corpus will be one of the most suitable corpora to be used in the Digital 
Speech Processing techniques of the Punjabi language (such as the classical linear 
prediction analysis and synthesis technique), because of the many new features of this 
corpus. 


Some original features of this corpus are mentioned below: 
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1. Each item of the corpus is unique. Each sentence or boli in the corpus has been 
carefully designed so as to convey something new and special about the Punjabi 
language [13-14, 43, 48, 95-97], Malwai dialect [103, 112, 140], Gurmukhi script [13- 
14, 43, 48, 95-97, 103, 112, 140], Phonetics, vowels, consonants, and vowels signs 
(aka matras or accessory signs or diacritical marks) by using a wide variety of 
vocabulary, idioms and expressions. In particular, a special feature called Anupras 


Alankar (WS Wad) has been used throughout the development of the corpus 
8 


(Anupras means “alliteration’’). 


2. All categories and phonetic characteristics of the speech sounds (e.g., voiced, unvoiced 
or voiceless, nasals, non-nasals, aspirated, unaspirated) have been recorded, and 
analyzed. 


3. The words and sentences used in the corpus have been selected from books written by 
the Sahitya-Academy Award winner writers of the Malwa region (The Sahitya- 
Academy Award is the highest literary award administered by the Government of 
India). 


4. Punjabi is a tonal language. Phonemes involving tones in a language are known as 
Tonemes. Several sets in the corpus have been designed to illustrate these tonemes: 4, 


¥, Z, U, J (voiced aspirates), and conjunct consonant J. 


5. Several sentences in the corpus have been designed to demonstrate reduplication. 
Reduplication implies that the meaning of a word can change with the addition of the 


sign called (ddak) as in UST (means address) and U3T (means leaf). Gill and Gleason 


[43] call it gemination: “Gemination is written by the sign /addak/ above and before 
the consonant to be doubled.” Bahri [13] calls this phenomenon as “Long (or double) 
consonants’. 


6. Several sentences in the corpus (set 45) have been designed to demonstrate that in 
these pairs of words, the meaning of the word changes with the addition of // at the 


end of the word, and its pronunciation is also extended. A new term Extended 
Pronunciation (BHA€" FATE) has been coined by the author for this feature of the 


Punjabi language. 
7. Some lines of the popular folk songs have been used in the original form. 
8. Some lines of the popular folk songs have been used in the modified form. 


9. Some single line folk songs (bolis) have been recorded and analyzed. 
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10. Some new bolis have been written (the author is a creative writer, and has published 
five books of the original Punjabi poetry). The new bolis written by the author, end 


with the symbol double dandi (||) to emphasize the fact that these bolis are original. 


11. Some theth pendu shabads (rustic words), and expressions that are slowly fading out 
from everyday use have been included in the corpus. Fading letters (e.g., 2, &) have 


also been included. 


12. Speech sentences spoken by a variety of speakers (based on sex and age) have been 
employed (adult male and female speakers of age range: 24 — 70 years have been 
recorded for the speech corpus. 


13. Speech sentences spoken by different (15) Punjabi speakers from different villages 
and cities of different (ten) districts of the Malwa region of the Punjab state of India 
have been recorded. 


14. In addition to Malwain (a female of the Malwa region), the names of about two dozen 
villages and cities of the Malwa region prominently appear in some of the bolis to 
highlight the fact that this work concentrates on the Malwa region of the Punjab state 
in India. 


15. Since the Malwai people pronounce the names of places in slang form instead of their 
actual names, many items in the corpus include the slang names of cities and villages. 
Places named Barnala, Faridkot, Jagraon, Moga, Mukatsar, Raikot, Ludhiana, and 
Amritsar have been used in the corpus in their slang form. 


16. Many popular names and nick-names for the Malwai males and females are included 
in various parts of the corpus. 


17. Three young freedom fighters Shaheed Kartar Singh Sarabha, Shaheed Bhagat Singh, 
and Shaheed Udham Singh Sunam (Shaheed means Martyr), who sacrificed their 
lives during the independence struggle of India before 1947 are very popular amongst 
the youth of the Malwa region. The names of all three of these martyrs have been 
respectfully mentioned in several sentences and bolis of the corpus. 


18. The names of the trees, weeds, and crops (either grown by seeding or naturally grown 
as weeds in the Malwa region) have been included in many items of the corpus. 


19. Many birds, insects, and animals found in the Malwa region (and frequently 
mentioned in everyday conversations and communications of the Malwai 
population) have been included in many items of the corpus. 
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20. Males and females of the Malwa region are fond of wearing jewelry items generally 
made of silver and gold. Many jewelry items have been prominently mentioned in 
many sentences and bolis of the new corpus. 


The fact that the new corpus is very rich and versatile due to a wide variety of its 
linguistic and cultural features mentioned here makes it an ideal corpus for any serious 
speech processing work in the Punjabi language. 


7.1.3. Linear prediction Analysis and Synthesis of the Punjabi speech 


This is the third major objective of this project. In the LP model of speech 
production, the current speech sample is predicted by using the linear combination of the 
past p speech samples. Each of the four control parameters (LP coefficients, 
voiced/unvoiced decision, pitch period for voiced sounds, and gain) can be determined 
directly from the speech waveform using more than one method, thereby making each 
analysis/synthesis project substantially different from every other project. In this work, 
autocorrelation method (for computing LP coefficients), ZFF algorithm, and RAPT 
algorithms (for computing V/UV decision and pitch), and RMS method (for computing 
Gain) have been used for computing the four control parameters of the linear prediction 
model of speech production. Final results (including informal perceptual listening tests, 
spectral characteristics, low bit rate transmission, and savings in the computation) 
reported in this work strongly confirm that the LPC technique is capable of generating 
high quality synthesized Punjabi speech output. 


The quality of the processed speech was judged by the informal perceptual 
listening tests throughout this work. We used the informal perceptual listening tests 
mainly due to the absence of any other facilities for speech tests in most Digital Signal 
Processing laboratories. The second reason is that the informal perceptual listening tests 
are considered to be the best criteria by the author. The human ear, above everything else, 
still remains the decisive mechanism for the perception of any speech, whatsoever. In 
addition to many other speech researchers, Ghosh et al. recently confirmed this important 
fact experimentally that the auditory filter bank in human ear is a near-optimal speech 
processor for efficient speech communication between human beings [129, pp. 710]. 


Additional insight has been provided by the spectrographic analysis conducted 
using Praat and MATLAB where the formants, intensity, pitch, V/UV contours, and FFT 
of the speech waveforms have been plotted. It has been concluded that: 


1. Number of poles p = 12 is sufficient to provide an adequate representation of speech 
signals. A slight degradation in the quality of the synthesized speech is noticeable 


when p is decreased to 8 and although poor in quality, speech can be synthesized with 
pas low as 2 (Table 5.2). 
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2. A Hamming window in the autocorrelation method improves the results, more than the 
implicit rectangular window. 


3. Due to the limitations of the simplified all pole model, the quality of the nasal, plosive 
and voiced fricative sounds is not as good as that of the voiced, unvoiced and non- 
nasal sounds. 


4. No significant improvement has been noticed in the overall performance of the speech 
synthesizer when different pitch detection algorithms were used. Therefore, the well- 
known result that no single pitch detection algorithm is uniformly top-ranked across 
all environments (90, 99] has been confirmed. 


This project is a successful attempt to address the important issue of synthesizing 
high quality speech (or generating high quality synthesized speech or synthetic speech) in 
the Punjabi language. 


The nature of this work is highly interdisciplinary. This work requires the 
knowledge of Computing Science, Information Technology, Electrical Engineering, 
Mathematics, Statistics, Linguistics, Phonetics, Punjabi Language, Gurmukhi Script, and 
the Malwai Dialect. 


7.2. FUTURE DIRECTIONS 


There are several possible directions and extensions of this work. Six most 
important extensions are mentioned below. 


1. This work concentrates on the Malwai dialect of the Punjabi language. Punjabi 
University (Patiala, India) published a list of 31 dialects of the Punjabi language. It 
will be interesting to deal with the corpus designs and analysis/synthesis of the other 
dialects of the Punjabi language. 


2. The comparisons of the results obtained for the projects on different dialects can also 
generate significant results. This comparison will be an important extension of this 
project. 


3. This work concentrates on the Gurmukhi script of the Punjabi language. The 
comparisons of the results obtained for two different scripts (Gurmukhi and 
Shahmukhi) can also lead to important projects. 

4. This work concentrates on the linear prediction analysis and synthesis of the Punjabi 


language. Other researchers have synthesized the Punjabi speech using synthesis 
techniques other than the linear prediction technique. The comparisons of the results 
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obtained for various synthesis techniques can be another fruitful extension of this 
work. 


5. A comparative study of the classification of the Punjabi speech sounds, and phonemes 
by various authors can be an interesting project. 


6. The speech and text corpus designed in this work consists of approximately 300 items 
(sentences, and bolis), and at least twenty original features. It is our intent to convert 
this into an on-going project, and keep adding more items, and more unique features to 
the corpus. 


Keywords: Computational Linguistics, Linear Prediction Analysis and Synthesis of Speech, 
Linear Prediction of Speech, LPC, Punjabi Speech Processing, Digital Speech Processing, 


Linguistics, Phonetics, Punjabi Language, PUNJARPAbet, ARPAbet, IPA, Phonetic 
Alphabet, Coding Scheme, Text Corpus; Speech Corpus, Punjabi Corpus, Punjabi Corpora.= 
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We2. 


3. 


AG. 


APPENDIX A 
CORPUS (52 SETS) 


Ge Gs! Guer fag ul? Cast ez! 


oe Ullu! tigda kI6 @? Uggli phor! 
/OW EY - UH LL UW! UWn hG D AA - K IH OWn - AEn? 
UHn GG LIY - PH AH RH!/ 


Gud Gun fry HoH wT 8s feu) 


ont udem slg sUnam da bUtt dIkhao. 
/OWh N UWn - UW hD AHM -S IHn hG-S UHN AAM-D AA-BUHTT- 
D IH KH AA OW/ 


wa! ut ag, vig vlad wie & up] 


omor! é kar, 5 enok ddar le a. 
/AA M AHR! AEn - K AHR, AOh - AEN AH K - AHn D AHR - L AE- AA/ 


wate Je frsrete, thr ade weet frie] 


aie hatth mIlaie pesa kadd jalebi Iaie. 
/AA TY EY -H AH TTH- MIHL AATY EY, P AES AA - K AHhD:D: - 
J AHLEY BIY -LIHAATY EY/ 


thd we eg, Fada fet FS Surg! 


isor ote Idor, Ikvaja Itta ethe Hao. 
/TY SH AHR - AH T EY - IHn D AHR, IH K V AHn J AA - JH T:T: AAn - 
EY TH EY - LIH AA OW/ 


fees test ‘tg we eed we’ TSS At 

Idor Ilti ‘igor ae dolIddor jae’ gaUda si. 

/TIHn DAHR-IHLTIY ‘TY SHAHR- AA EY -D AHLIHDD AHR- 
J AA EY’ -G AH UHn D AA - STY/ 


isor, Ikk Ikk korke Ikvajja Itta ethe Ia! 


/TY SH AHR, IH KK - TH KK - K AHR K EY - IH K V AHnJ AA - 
TH T:T: AAn - EY TH EY - LTH AA/ 


Ad AY d Hed @ fg 'v 2 AS Ac Ss 
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sere sodu ne sUdor de sIr ’c che sott sotia choddia. 
/SH EY REY -S AHn hD UW -NEY -S UHn D AHR-DEY -SJTHR- 
°CH - CHH EY - S AH TT - S OWT: TYn - CHH AH D:D: TYn/ 


(9) Ae Het & A AHS? yet SA US 


seve seni ne so somvar sdgoni lassi piti. 
/S EY V EY -S AEN: TY -N EY -S AO-SOWM V AAR- 
S AHn hG AHN: TY -L AHSSTIY - PTY TIY/ 


(10) fa HS-Aea Ho fru, foe Ae" AH! 
kItthe sdt-sevak sUcca sIpahi, kItthe sona sum! 


/K TH TTH EY- S AHn T -- S EY V AH K - S UH CHCH AA - 
STHP AA HIY, K IH TTH EY -S AON: AA -S UW M!/ 


(il) 35. og, fhsdsst use es og dst vse! 


bohu, hImmot hehor de halt vala ra holi calde! 
/B AH H UW, H IHn MM AH T-HEY HAHR-DEY-HAHLT- 
V AAL AA-R AAh-H AOLTY -CH AHLD AE!/ 


(12) Jat Jatin, Je 3 ait? 


hékori hukmla, hadd ho gi? 
/H AEn K AH RH TY- HUH K MIH AAn, H AH DD -H OW - GIY?/ 


(13)/(148) TAH STA S IIMA TT Se fifo 


hakom ne his nt horao ke hora hira jittIa. 
/H AA K AH M-N EY - H AHn S - N UWn-H AHR AA AH - KEY - 
H AHR AA-HIY RAA-J IH TT IH AA/ 


(14) a6. facet aHstd ast asst aast 3 Sait nt? 


kéri komli ne kali kurti kSdoli ’te tdgi €? 
/KEYh RH TY -K AHMLIY-NEY-KAALTIY-KUHRHTIY - 
K AHn hD OW LIY -’T EY - T: AHn GITY - AE?/ 


ova 


(15) Ie Az, a at ot ad, As wie d_i 


kaka keval, ka ka na kor, kutte ant kuttt. 
/K AA K AA - KEY V AHL, K AAn - K AAn-N AA- K AHR, 
K UHTT EY - AAn NX UWn - K UH T:T: UWn/ 


(16) afer, fa as dt ga, fae as ast 
kella, kItthe kol di ku ku, kItthe kod kabaddi! 


/K AE LTH AH, K IH TTH EY -K OWL-DIY - K UW-K UW, 
K IH TTH EY- K AO D: - K AH B AH D:D: TY/ 
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(17) 47. HAH ut Sus de ys Dust 


xosma khani ne khora khotta khui ’c pa ’ta. 
/X AH SM AAn - KH AAN: TY -N EY - KH AHR AA - KH OW T:T: AA - 
KH UWh - ’CH - P AA - ’TAA/ 


(18) Ge ofan, 4g our, dts ya 


oe khottla, khoru na pa, khir kha. 
/OW EY - KH OW TT ITH AA, KH AOR UW -N AA -P AA, KHIYR - 
KH AAh/ 


(19) uga fy ds a as & fost 3 


khork slg khdde di khed da khIdari he. 
/KH AH RH K - S IHn hG - KH AHn D: EY - DIY - KH EY D: -D AA - 
KH IHD: AARIY - H AE/ 


(20) yoy ftua you Het fa yer YSU? 


khUccA ’c khIccke khUrcona mara kI khUda khUrpa? 
/KH UH CHCH AAn - ’CH - KH TH CHCH K EY - KH UHR CH AHN: AA - 
M AA R AAn - K JH - KH UHnh D: AA - KH UHR P AA?/ 


21) a8. deasgssda nad 3a ya 


géde gabboru ne pago ke jagire t6 gUr khada. 
/G EYn DEY -G AH hBB AHR UW - NEY - Pl AAG OW - K EY - 
J AH GITY REY - T OWn - G UH RH - KH AA hD AA/ 


(22) TSS & Ts-forrat SF ars Ea sfemr| 


golador ne gtir-glani t6 gara dUdd chokla. 
/G AHL AA hD AH RH - N EY - G UW hRH -- GIH AA NIY - TOWn - 
G AA hRH AA - D UHhDD - CHH AH K JH AA/ 


(23) fied dia fron, df 28 Ft dis aise! 


gidor géda jla, g3 vele i gall golde! 
/G IHn D AHR - G AEn D: AA - J TH AA, G AOn - V EY LEY - ITY - 
GAHLL-GAOLD AE!/ 


(24) 49. Ws atts det aos ug 


kd] kiti ta kotene nal kéri. 
/K1 AOL -KTIY TIY - T AAn - K] OW T: AHN: EY -N AAL - 
Kl AH RH UWn/ 


(25) Us y~Bes st fu sa, whit o ag SH 


kol kUloné ta klo chak, kissi na kora Ii! 
/K1 OW L - K1 UH L AHN AEn - T AAn - KI IH OW - CHH AH K, 
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K TY hSS TY -N AA- K AHR AA -LTYn!/ 


(26) ue S dys HS HS HY eT ok aT far 


kUdde ne khdgure mar mar sdg da kdga kora IIa. 
/K1 UH DD EY - N EY - KH AHn hG UWR EY -M AAR-M AAR- 
S AHn hG - D AA - K AHn hG AA - K AHR AA - LTH AA/ 


(27) Sfumrs Pa whims et usr ur fam 


boglar mége kUmlar da kora kha gla. 
/B AH GIIH AA RH -M EY hGEY - KI UH MIH AAR- D AA- 
KI] OW RH A - KH AA - GJIH AA/ 


(28) Wet WS AUS SH YS we Ful 


kdta kar sdgol vala kha katt dige. 
/Kl AHn T: AA - K]1 AHR -S AHn hG OW L - V AAL AA - KH UWh- 
KI AH T:T: - D: UWn hG AE/ 


(29/149) _- Wat Hy sd Sg Sa Hu, sing wary e Hiss 


kara mag pdr parti tera mdggora, cajra kdrao de mittora. 

/K1 AH RH AA - M AA hG - Pl AHR - Pl AHR UWn- TEY R AA - 
M AH hGG AH R AA, CH] AAn J R AAn - K] AH RH AA AH - DEY - 
M IH TT AHR AA// 


30) 810. ase seaGde! to vise a ul 


kSnon chonkaUdie! dUdd ’c minona na pa. 
/K AHn NX AH N: - CHH AH N: K AA UHn DIY EY! D UH hDD - ’CH - 
M IYn NX AHN: AA -N AA - P AA/ 


Bl Vil. BVZUIs varus ve A wAS ve? 


cetu core ne cokctidor calao ke carta cd? 
/CH EY T UW -CH AOR EY -N EY - CH AH K CH UWn hD AHR - 
CH AH L AA AH - K EY - CH AA hRHT AA - CH AHn D?/ 


(32) os! dd ol ot es, yar UU Sal! 
cora! cé cé ci ci chodd, cukné ’c car thokt! 
/CH OW R AA! CH AEn - CH AEn - CH TYn - CH TYn - CHH AH D:D:, 
CH UW K N: EYn - ’CH - CH AAR - TH: OW K UWn/ 
(33) oe deo, foes dar du, da st ase! 
caca cddrla, cltt cdga rakkh, cUlla ta balde! 


/CH AA CH AA - CH AHn DR IH AA, CH IH TT - CH AHn G AA - 
R AH KKH, CH UH hLL AA - T AAn- B AHL D AE!/ 
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34) 812. east as, sect at 98? 


chajj ta bole channi ki bole? 
/CHH AH JJ - T AAn -B OWL EY, CHH AAN: NITY - K ITY -B OWLEY?/ 


(35) efont 2 as 0 dte-Ay featur 
chollla de khet ’c6 chiba sopp nIkkolla. 


/CHH OW LL IH AAn - D EY - KH EY T - °>CH OWn - CHH IYn B AA -- 
S AH PP -N JH KK AHL IH AA/ 


(36) fee o Suz '0 est act & de e2l 


chide ne chopper ’c6 cheti cheti che kocchu phore. 
/CHH THn D EY - N EY - CHH AH PP AH RH - ’CH OWn - CHH EY TITY - 
CHH EY T IY - CHH EY - K AH CHCHH UW - PH AH RH EY/ 


37) 13. Ade yet wd, fas StH oI 
jogdi mdgdi jago, jItt li jito ne. 
JAH GDIY -MAHhGDIY -J AAG OW, JIH TT -LIY -JIY TOW - 
NEY./ 


(38) Ast fHES & Ws Aa S HIS ST att 
jotti jIdal de juthe jagg nui jagal logg gi. 
/J OW TT IY -J IHn D AHL-DEY - J UW TH: EY - J AH GG- N UWn - 
J AHn G AA L-L AH GG - GIH/ 


(39) ors tt As vy, Hor St or 

jUale di jott ’c, jaa i jaa. 

/J UW AA LEY -DIH-J AH TT -’CH, J UWn AAn- LY - J UWn AAn/ 
(40) fed oO Horet 6 AF ava AAI 


jidoro di jathani ne jore jUak j3mme. 
/JTHn DAHROW - DIY -J AH TH: AAN: IY - NEY - J AO RHEY - 
J UW AA K - J AHn MM EY/ 


(41) 31k. HSS Hy, Sct ot Baa ve U| oes! 

mdjelo! majj cotti da cdgra cat pat nabero. 

/M AH hJ AE L OW! M AH bJJ, CHI OW T:T: TY -D AA - 

CHI AH G RH AA - CHI AH T:T: - P AH T:T: - N AH B EY RH OW/ 
(42) B83", H8d fr, va de fers fart 

cddda, médjeru jla, cdjju pon glij gle. 


/CHI AHn D:D: AA - M AH hJ EY R UW - J TH AA, CHI AH JJ UW — 
P AON: -GIH hJJ - GIH AE/ 
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(43) 215. ve guor WHE, G3 age fiitt Asta (AE = Ie) 


cdnne ni c3nna samna, utte kanon sIddi sotir. 
/CH AHn NN EY - N UWN - CH AHn NN AA -S AAhMN: AA, UTT EY - 
K AAn NJ AHN: -S IH hDD IY -S AH TITY R/ 


(44) dz agdt 3 eas a 


jan cardi he albele di. 
/J AHn NJ - CH AH hRH DIY -H AE- AHL BEY LEY -DIY/ 


(45) 16. ga-esdd cctadt wis wAHS Sfp 


tUk-tek ne tatiri anti osman thommla. 
/T: UH KK -- T: EY R- NEY - T: AHT: TYHRIY - AAn NX UWn - 
AHS M AAN - TH AH hMM IH AA/ 


(46) fea wt cfataur, de wig dee a HI 


tik ja télla tottu ant titone na mar. 
/T: IH K-J AA - T: AEh LTH AA, T: AH T:T: UW - AAn NX UWn - 
T: ITY T: N: EY -N AA-M AAR/ 


(47) cet, och & ced a uz ad 


tetue tutla da totber na khora koro. 
/T: EY T: UW EY, T: UW T: TY AAn- D AA - T: AH T: BAER -N AA - 
KH AH hRH AA - K AH R OW/ 


(48) S17. Sad dod, oa ofent vy Sfamr fart 


thakor thothera, ’thara thanla ’c thoggla glé! 
/TH: AA K AHR - TH: EY TH: AH R AA, ’TH: AA R AAn - 
TH: AA N: IH AAn - ’CH - TH: AH GG JH AA - GIJH AE!/ 


(49) Od-O0 SS, Sa Ta SA BB fd 


thu-tha chodd, thik ho ke thUkk nal ré. 
/TH: UWh -- TH: AAh - CHH AH DD, TH: TY K -H OW - K EY - 
TH: UH KK - N AA L- R AEh/ 


(50) Bo o gs, uky ge! 

cuth na bol, pig cut! 

/CH1 UW TH: -N AA - B OWL, P TYn hG - CHI UW T:!/ 
(1) 318. Gorgeero, 3g dass! 

vaddIa palvana, dollu ’cé dUdd dopph le! 


/V AH D:D: IH AA - Pl AHL V AAN AA, D: OWL UW - °CH OWn - 
D UH hDD - D: AH PPH - L AE!/ 
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(52) soas', 683305, sa az 


dorakla, thdd t6 dor na, dad kédd. 
/D: AH R AA KL AA, TH: AHn hD: - T OWn - D: AH R - NAA, D: AHn D: - 
K AH hD:D:/ 


(53) ade, 3g eg St as SU! 


dogra, doru vajju ta dad vi pou. 
/D: OW GR AA, D: AOR UW -- V AH JJ UW- T AAn - D: AHn D: - VTY - 
P AH UW!/ 


(54) 219. das, wad yt fa sour a ge? 


takkona, yari ¢ kI chollla da vadd? 
/T:1 AH KK AHN: AA, Y AARTY - AE - K IH- CHH OW L IH AAn - 
D AA - V AHhD:D:?/ 


(55) fess wat 6 Us yal fa ave act? 


tiddol taddi ne t3dd tidi kI kad kSddi? 
/T:1 1H D:D: AH L- T:1 AA D:D: IY - N EY - T:1 AH D:D: - T:l UWn D: TY - 
K IH- K AA hD: -K AH hD:D: IY?/ 


(56) fess", wd Vat aft? Ss ge vi 


tlddola, 4 toli kaddle? latta vadd di! 
/T:1 TH D:D: AH L AA, AAh - T:] OW L TY- K AH hD:D: TH AE? 
L AH TT AAn - V AH hD:D: - D UWn!/ 


(57) ug oust sd dc del 


téru ne tol de dade vatt kddde. 
/T: LEY RUW -NEY - T:1OW L - DEY - D: AA hD: EY - V AHT:T: - 
K AH hD:D: EY/ 


(58) ad ud 6, fés o Ser ud HE eset nig! 
kéddo téru nu, tidd na vdddIa tuhi sone tolki ani! 
/K AH hD:D: OW - T:1 EY R UW - N UWn, T:1 TH D:D: - N AA - 


V AH hD:D: IH AA - T:| UW HTY - S AHN: EY - T:1 OW L K TY- 
AAn NX UWn!/ 


(59) Ud 2 gdm o de dfenr, fdadt ce ag ui 
tdgge de ciidia na vadd kUddla, tlbori tet kor di! 
/T:1 AH GG EY - DEY - CH UWn hD: IY AAn- N AA - V AH hD:D: - 
K UHn hD: IH AA, T:1 IHn B AH RIY - T: AET: - K AH R - D: UWn!/ 


(60) He-Ae SH ag ta ae, HY deer fegd! 
mUdd6-sUdd6 k3mm kor tag nal, mogge kdddda phirdé! 
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/M UH hD:D: OWn -- S UH hD:D: OWn - K AHn MM - K AHR - T:] AHn G - 
N AA L, M AH hGGEY - K AH hD:D: D AA - PHTH RD EYn!/ 


(61) ©20. Hea ’S HEA, OT HEA 


monke ’te monka, tha monka. 
/M AHN: K EY -’T EY -M AHN: K AA, TH: AAh - M AHN: K AA/ 


(62) 3e, Te U ose 3 uel S fae! 


péne, rano di nonod te dorani ni mIlné! 
/P] AEN: EYn, R AA N: OW - DIY -N AHN: AHD-T EY - 
D AHR AAN: TY -N UWn-M IH LN: AEn!/ 


(63) wet 6 USE OS WET UE! 


tani nti pUrane thane jana pene. 
/T:1 AAN: TY - NUWn-P UHR AAN: EY - TH: AA N: EY -J AAN: AA - 
P AEN: AE/ 


(64) 321. 3dAN, J 3308 Fe add Sas fig ad! 


tarsemo, tii totkore bad korke tokkola sidda kor. 
/T AHR S EY M OW, T UWn - T OW T K AH RH EY - B AHnD - 
K AHR K EY -T AH KK AHL AA - SIH hDD AA - K AH R/ 


(65) Bo UII 3S, sar St ve 


ton da tarU tota, torza ti lave. 
/T AHN-DAA-TAARUW-T OWT AA, T AHR Z AAn - T IHh - 
L AA V EY./ 


(66) e far ’'d, a ST 

na tInna ’c, na téra ’c. 

/N AA - T THn NN AAn - ’CH, N AA - T EY hR AAn - ’CH/ 
(67) 33 '3 fa f830 88 ah 

tut ’te tInn tIttor bethe si. 

/TUWT -’T EY -T IHn NN - TIH TT AHR-B AETH: EY - STY/ 
(68) OF 3o o f330d, SF 3d o SdeAl 


na tota na tIttor, na tor na torbuz. 
/IN AA - TOW T AA -N AA - TIH TT AHR, N AA -TAHR-N AA- 
T AHR B UW Z/ 


(69) 334, 33a, Zsa, Fs, J AHS! 


tutok, tutok, tutok, tutia, he jamalo! 
/T UW T AHK, T UW T AH K, T UWT AH K, T UW TIY AAn, 
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HEY -J AHM AAL OW!/ 


(70) 822. Bas weag qaser di 


thothla thanedar thathlaiida he. 
/TH AH TH L AA - TH AAN: EY D AAR - TH AH THL AOn D AA - H AE/ 


(71) aye, ast va, det au, urs ue 
thdmmna, thali cokk, roti thopp, pathia potth. 
/TH AHn hM N: AA, TH AA LIY - CH AH KK, R OW TIY - TH AH PP, 
P AA THIY AAn - P AH TTH/ 


(72) 223. tude t ee Ves 


dUllu di daddi de do dad _dUkhde ne. 
/D UH LL UW - DIY -DAADTY -DEY-DOW -DAHnD- 
D UH KHDEY -NEY/ 


(73) fest four varous", UO at clger? 


dIlli dla dUkandara, dUdd_ ni dida? 
/DIHLLTY -DIH AA- DUH K AAND AAR AA, D UW hDD - 
NIY - DIYnh D AA?/ 


(74) de, ston His gg tae fear! 


badé, ta dos mil dur dorke diIkha! 
/B AHn D AEn, T AAn-D AHS -MTYL-DUWR-D AORHKEY - 
D IH KH AA!/ 


(75) G2k. ug>erug ug, ge gant ct Jul 


padi da pdd_paddara, tUdd taroma di vadd. 
/P AAn hD TY - D AA - P AHn hD - P AH hDD AH R AA, Tl UWn DD - 
Tl AH R AH M AAn- DIY - V AHhDD.// 


(76) gor, UtdH oes fra ZI 

bUddua, tiraj nal sidda tUr. 

/B UH hDD UW AA, TITY R AHJ-N AAL-SIHhDD AA - T UHR/ 
(77) 025. foxdttgae da 3d ca SA 


nimmo di nui de nokk ’te n3 tdke logge. 
/N TIHh MM OW - DIY - N UWnh- DEY - N AH KK -’T EY - N AOn - 
T AHn K EY -L AH GG EY/ 


(78) ard, S83 Stet Sag DT a Jel 


jUano, nolka nivi nUkkor ’c na hove. 
/J UH AA N OW, N AHL K AA-NIY VIYn- N UH KK AHR- ’CH- 
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(79) 


(80) 


(81) 


(82) 


(83) 


(84) 


(85) 


(86) 


(87) 


Sri Satguru J agjit Singh Ji eLibrary 


U 26. 


S 27. 


N AA -H OW V EY/ 


Ug UdeS, UT UTS 
pac parvan, pac perdan. 
/P AHn CH - P AHR V AAN, P AHn CH - P AHR DI AA N/ 


Ysa, US Sz a wry Uy Fe SH 

purna, patthe vddd ke ape pig cut li! 

/P UW RN AA, P AH T:TH: EY - V AH hD:D: - K EY - AA PEY - PTYn hG - 
CHI UW T: - LTYn!/ 


us, ug-usht § fis ous uret doe 

pUtt, posu-pdchia ni pid da pon pani péde. 

/P UH TT, P AH SH UW -- P AHn CHH IY AAn - N UWn - P IHn D: - D AA - 
P AON: - P AAN: TY - POWnh D AE/ 


cade wv dex adteae urfimr Fh 


phokirie da phUphor phoridkot6 ala si. 
/PH AH K TY RIY EY - D AA - PH UH PH AH RH - 
PH AHRIY D K OW T: OWn - AA TH AA - STY/ 


gue gad ed af do AT U! 


phUmon phUkra pher ki phonnti kho du. 
/PH UHn M AHN: - PH UH KR AA - PHEY R- K TY - PH AH NN UH - 
KH OWh - D UW/ 


@3-feen 6 dg e du FW asd 31 

phore-phInsia ne phoggu de phdg jé kUtor ’te. 

/PH OW RH EY -- PH TH N: STY AAn - N EY - PH AH GG UW - DEY - 
PH AHn hG - hJ EY - K UH T AHR -’T EY/ 


fee Ha! gs wt cot S de SF gat St! 
phitte mii! ph3nne x4 phoji ne phott poé phuk ’ti! 


/PH IH T:T: EY - M UWnh! PH AHn NN EY - X AAn- PH AOJ TY - N EY - 
PH AH T:T: - Pl OW IJHn - PH UW K -’TIY!/ 


3 28. ¥e-e8e 9 a nidsz wee? 


bUdd-vdlet ho ke ondéber bondé? 
/B UHh D:D: -- V AH hL EY T: - H OW - K EY - AH N OW hB AH RH - 
B AHN: D AEn?/ 


adds yor, det ges fa sreg? 
bevkuf bUddua, b3da bonné kI bador? 
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/B EY VK UW F-B UH hDD UW AA, B AHn D AA -B AHN: N AEn - 
K IH - B AAn D AH R?/ 


(88) Sfur des! yd Fe wis dufr fead? 


bolla baddala! bure cote ani bsdlla phIrdé? 
/B OW LIH AA - B AHDD AHL AA! B UWREY - CHI OWT: EY - 
AAn NX UWn - B AOn D LIH AA - PH THR D AEn?/ 


(899) 329. 3esaz3e, sa saTH TI 


podu pdkor polla, pola pddari he. 
/P1 AOn D UW - Pl AOn K AH RH - PI OW LLL AA, PI OW L AA - 
Pl AHn D: AARTY -H AE/ 


(90) 3x sad, for Fad! 


poda pddaria, kinna kU par! 
/P] AHn D: AA - Pl AHn D: AA RTY AA, K IHn NN AA - K UH - PI AA R!/ 


(91) fee set 3a" gas 35 a 8s Th 


Pldo pabi péra pukona pann ke pUII gi. 
/P1 THn D OW - PI AA BIY - Pl AE RH AA - PI UW K AHN: AA - 
Pl AHn NN - K EY - Pl UH LL- GIY/ 


(92) sna set S 3H, Sct cise HE Sse Fh 
pajjke pai ni péj, puri gsbbon maj robodi si 
/P1 AH JJ KEY - PLAATY -NUWn-PIEYJ, PLUWRIY - 
G AH hBB AHN: -M AH hJJ -R AHn hB AH DIY - STY/ 
(93) 32 vist 6 aH fae Sian? 
pére dbi ni poras kI6 1ébbla? 
/P| AE RH EY - AHn hB IY -N UWn- POWR AHS - K IHOWn- 
L AH hBB IH AA?/ 
(94/150) SH Hh ga sa SI? 
pera ji! phuk poraa li? 
/P! AHR AA -J TY! PH UW K - Pl AHR AA AH -L IY?/ 
(95) H30. Hyd ct Ht, WH HT, HAT WAG, AS Te! 


mdgegor di ma, mama mami, massi masor, mele goe. 
/M AH hGG AH R - DTY- M AAn, MAA M AA-M AA MIY, 
M AA SS TY -M AAS AHRH, M EY LEY -G AH EY/ 


(96) Yan Het HOE } HIT US wi? 


mUksor moja manon nti mSdtor pdrne 4? 


146 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


/M UH KS AHR-MAOJ Aan-M AAN AHN: - N Uwn- M AHn T AHR - 
P AH hRHN EY - AAn?/ 


(97) Hfser! Hd-nent es! Hst-AA & ufdar We Jai 
mUdla! mi-mUlaza chodd! molli-més da méga mUII vott. 
/M UHn D: IH AA! M UWnh -- M UH hL AA Z AA - CHH AH D:D:! 
M AO LLIY --hM AEnS -D AA-M AEnh G AA -M UHLL- V AHT:T:/ 


(98)/(151) fie ufewrg & vd Hea a Heat Soni 


mido mUtlar ne mi motkao ke motka ponnla. 
/M THn D OW - M UH T: IH AAR -N EY - M UWnh- M AHT: K AA AH - 
K EY - M AHT: K AA - PI AH NN IH AAn/ 


(99) 31, WSS! Uda a, Wo tue SUR A 


yabbola! york na, yara de yakke ’te car ja. 
/Y AHhBB AHL AA! Y AHR K-NAA, Y AAR AAn - DEY - 
Y AH KK EY -’T EY - CH AHhRH - J AA/ 


(100) 332. ddgTHEds ddaesg Hs ssl 


rode rahi nti rete ’c6 ragdar rithe ldbbe. 
/R OW D: EY -R AA HTY -N UWn- REY T EY - CH OWn - 
R AHnGD AAR -RIY TH: EY - L AH hBB EY/ 


(101) fo wat TS Sots feat och 


rla taddi rat rohi ’c rigdi rahi. 
/R TH AAh- T:1 AAn D: TY-RAAT-ROWHIY - ’CH-RIJHnGDIY - 
R AHHIY/ 


(102) 833. Sdd ct Sst 33 BJ BTET Th 


lohore di 15ni lott hu 1Uhan ho gi. 
/LAHHAOREY - DIY -LAHnNX IY -LAHTT-LAHHUW- 
LUHHAAN: -HOW-GIY/ 


(103) oS tt wat 2 ys 'S Sst At 


lale di lali de mi ’te lalli si. 
ILAALEY-DIY-LAALTY -DEY-M UWnh-’T EY -LAALLITY - 
STY/ 


(104) OS MN Ss, ToT Hl 


lala ji! la la, la la ho gi. 
ILAALAA-JTY!LAA-LAA,L AA-LAA-HOW-GTY/ 


(105) dS, 3 wees vst Fg Sa S SH 
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gello, kIllo allu te cali kU 15g le li. 
/G EY LL OW, KILL OW - AA LL UW - T EY - CH AALTY - K UH - 
LAOn G-LAE-LIYn/ 


(106) 234. Zenfonr, wfent feast dst gud fa Fes? 


/vonjarla, bavla vére vana vecdé kI vajlia? 
V AHN: J AA R JH AA, B AA V IH AAn - V EYH RH EY - V AHn NX AAn - 
V EY CHD AEn- K IH- V AHnhJ LIY AAn?/ 


(107) Sfoaet ot Sat vy eapet at HM | 


vérke, di vakkhi ’c vogava vdtta marla. 
/V AEh RH K EY - DIY - V AH KKHIY - ’CH - V AH G AAh V AAn- 
V AHT:T: AA -M AA RIH AA/ 


(108) we ot ged dy Suge Je 
evé ni bador vaj ’te cdraida hUda. 
/AE V EYn-NIY -B AAn D AHR- V AHnbJ - ’T EY - 
CH AH hRH AA ITY D AA - H UHn D AA/ 


(109) #35. uUpa Sagat eT ea img 


poraku ni karni da dUdd plao. 
/P AH hRH AA K UW - N UWn- K AA hRH NITY - D AA - D UHhDD - 
P IH AA OW/ 


(110) wadt Ay 8 wifyar ur ST 


dUkhdi jar ne orlkka pa ’ta. 
/D UH KH DIY -J AAhRH - NEY - AH RH JH KK AA - P AA -’T AA/ 


(111) Ba oT, B*o fee, frga wat vz 


loro na, carn dIo, jéra kori cdru. 
/L AH RH OW - N AA, CH AH hRHN - DIH OW, J EYh RH AA - 
Kl OW RH IY- CH AH hRH UW/ 


(112) 436. HdH A, Woes Hes Sal 
sorma ji, sandar sarvot chako. 
/SH AHRM AA-JTY, SHAAND AAR-SH AHRB AHT- 
CHH AH K OW/ 

(113) fey At, Adt HHS o, HaASt sal 


slg ji, S8go sarmao na, sokdjvi choko. 
/S IHn hG - J TY, SAHn GOW - SH AHR M AA OW -N AA, 
SH AH K AHn J V TY - CHH AH K OW/ 
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(114/152) = - ASH SST ASH ASH famr| 


sorma vala sorma sormao gla. 
/SH AHR M AAn- V AAL AA - SH AHR M AA - SH AHRM AA AH - 
GIH AA/ 


(115) Wd At, AD AST Je G, Ay vn}? 


4 ji, saho sa hoe 6, sUkkh e€? 
/SH AAh - J TY, S AA H OW - S AAh - H OW EY - OWn, S UH KHH - AE? 


(116) fano Hd d Ad Hda & Ad fus fafsyr 


KIson sa ne ser mar ke che ser klo jittla. 
/K TH SH AH N - SH AAh-N EY -SH EY R-M AAR KEY - CHHEY- 
S EY R - KI IH OW -J IH TT IH AA/ 


(117) Hat Hufsst & Har sfsnr 


sdki sexclli ne sosa choddla. 
/SH AOn K ITY - SH EY X CH IH LITY - N EY - SH OW SH AA - 
CHH AH D:D: IH AA/ 


(118) AAUA es, AA ATHE Fas Sul 


sasopaj chadd, sise samne sokal dekh. 
/SH AH SH OW P AHn J - CHH AH D:D:, SH TY SH EY - S AAh MN: EY - 
SH AH K AHL - D EY KH/ 


(119) 437. HAH Use, yRa A HS o fefinr agi 


xosoma khanie, xUsk je xot na IIkhIa kor. 
/X AH S AH M AAn - KH AAN: TY EY, X UH SH K- J EY - X AHT - 
N AA - LTH KH IH AA - K AHR/ 


(120) yAte & Hot HAT Hoth 


khUsie ne xuni xdjor xoridi! 
/KH UH SHTY EY-N EY -X UWNIY - X AHn J AHR - X AHRITY DIY!/ 


(121) dug Wie HS Yat Tw Ho Sf 
xtixar khive xan khoji da xun kholla. 
/X UWn X AAR- KHIY VEY -X AAN-KHOWJ ITY -DAA-X UWN- 
KH AO L IH AA/ 
(122) 3138. aed Thr st, gent dad gi 
godor gtjla, ta gUlami geb ho ju. 


/G AH D AHR-G UWn JIH AA - T AAn, G UHL AA MIY - 
GAEB-HOW-JUW/ 


149 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


(123) 739. met dAg, Avs Aer ve fe 


zoxmi ho jé ga, je call gla jaddu jidua. 
/Z AH X MIY - HOW - J EYn-G AA, J EY -CH AHLL -GIHAA - 
J AA D UW -J THn D UW AA// 


(124) fico Pet ct, ment IC AI 


zidgi zUlmi di, zoxmi hou zorur. 
/ZYHn DGITY -ZUHLMIYn-DIY, ZAHX MIY -HOW UW - 
Z AHR UW R// 


(125) afd ASeS Also faMr AFeI 


zéri zeldar jalla Jia jade. 
/Z AEHR TY - ZAELD AAR-J AHLIHAA -J IH AA- J AAn D AE/ 


(126) Bat ST TH Ue S TH ASI 


loka da raj papia de raz kholu. 
/LOW K AAn-DAA-RAAJ-PAAPTY AAn- DEY -R AA Z- 
hKH OW L UW/ 


(127) 240. sdut sdait $ saHt ow at? 


forebi fordgi nu ferza nal ki? 
/F AHR EY BIY -FAHR AHnGTY -N UWn-FAHRZAAn-N AAL- 
K TY?/ 


(128) ZHS cate , wen eH fev Suet 


fosal fakira di, ors fars vIcc Ugedi. 
/F AHS AHL-FAHKTY R AAn- DIY, AHR SH- F AHR SH - 
V IH CHCH - UH GG DIY./ 


(129) Sul. WAS wet, Ws Is wet! 


gal gal pani, gal gol khani. 
/G AHL-GAHL-PAAN:TY,GAHL-GAHL-KHAAN:ITY/ 


(130) HS HS Sect, HS HS uct 


mol mol nddi, mol mol patidi. 
/M AH L-M AHL-hN AOn DIY, MAHL-M AHL- PAA UHn DIY/ 


(131) HS HS oct S, IS Ts wt fare ust 
mol mol nddi de, gol gal a gla pani. 
/M AH L-M AHL -hN AOn DIY - DEY, GAHL-GAHL- AA- 
GIH AA -P AAN:IY./ 
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(132) IS 3 TS ISH STE TS Ts atti 


gelle ne gal6 golama 1aUn vali goll kiti. 
IG EY LEY -NEY -GAHLOWn-GAHLAAM AAn -L AAh UHN: - 
V AALIY-GAHLL-KIYTIY/ 


(133)/(157)  - BB™ VV SST, Uas BM | 


talao ’c na na, pokore talao. 
/T AHL AA AH -’CH - N AA - N AAh, P AH K AO RH EY - T AHL AA AH/ 


(134) WoTSt US, US USI 


okali dol, dal dol. 
/AH K AALTY -D AHL, DAAL-D AHL/ 


(135) fee Set S fara ct adt Sat ot wrSeh 


bido boli ni gldde di koi bolli ni aUdi! 
/B IHn DOW -BOWLIY -NUWn-GIhDDEY-DIY-KOWIY- 
BOWLIY-NIYn- AA UHnDIY!/ 


(136) WS fle SHE saad Wo FSI 
bal pIcche pajjana choddke ogg bal. 
/B AA L - PTH CHH EY - Pl AH JJ AH N: AA - CHH AH D:D: K EY - 
AH GG - B AA L/ 

(137) Tt fonret? Tat wit Sst? 


goli lava? golli a teri? 
/G OW LIY - LIH AA V AAn? G OW LIY - AAn- TEY RIY?/ 


42. Us UTS (T/T): 
(138) WS HE, IB 3S AZT SHA asst est! 


gall sUn, gal te gall da ’laj korali cheti! 
/G AHLL-S UHN:, GAHL-TEY-GAHhLL-DAA-’LAAJ- 
K AHR AA LITYn - CHHEY TIY! 


(139) eJvew ast } fhe a, fest 63 Hig eg Al 


vor han da kUri nti mI] je, tIbbi Utte mi vor je. 
V AHR-HAAN:-DAA-K UH RHIY -N UWn- MIHL-JEY, 
/T IH BBIY - UH TT EY - M JHnh - V AH bR -J EY./ 


(140) ao tt Ag 0 vos Aa 


kdnn di jér ’c caper jor. 
/K AHn N - DTY - J AH bRH - ’CH - CH AH P EY RH - J AH RH/ 
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(141) gS 95 2 de SI 


bUI bUI de bUII hille. 
/B UHL-BUHL-DEY -BUHhLL-HIHLLEY/ 


43. Us UTS (T/q): 
(142) AHet det 6 yo J a Fe oho 


sromni kovi ne prsdnn ho ke grath racla. 
/SHROW MN: TY -K AH VIY -NEY-PRS AHn NN -H OW - KEY - 
GR AHn TH - R AH CH IH AA/ 


(143) aH fone & dat Fat > uno vel 


brdm krIson ne cothi sreni nt prasen pUcche. 
/BR AHhM-KRIHSH AHN-NEY-CHAOTHIY -SHREYN:IY - 
N UWn - PR AH SH AH N - P UH CHCHH EY/ 


(144) am ocd] J fans Sebo 


bora notkhot he krIson kanoia. 
/B AH RH AA -N AHT: KH AHT: -H AE- K RIH SH AHN - 
K] AH NN AH TY AA/ 


(145) WY fy, frase, qHet wfasarg tt 


sadu slg, sehotmdd, sromni sahItkar he. 
/S AA hD UW - S IHn hG, S EY H AH T M AHn D, SH R OW MNIIY - 
S AA HIHT K AAR -H AE/ 


uu. Uso (S/z): 


(146) TIUT & AE, WaHS & AHS 


doropti da svdbor, orjan da sveman. 
/D AHROW PTIY -DAA-S V AHnB AHR, AHR J AHN -DAA- 
SV AEM AAN/ 


(147) A-fenea 3 fot ser J at? 
sve-vIsvas t6 bIna bada he ki? 
/S V AE -- VIHSH V AA S-T OWn- B IHN AAn- B AHnD AA -H AE- 
K IY?/ 
45. SHae Gude: 
(148/13) - FaH Sd STM ATT Se ff 


hakoem ne hés nti horas ke hora hira jlttIa. 
/H AA K AH M-N EY -H AHn S - N UWn- H AHR AA AH - KEY - 
H AHR AA-HIY RAA-JIHTT IH AA/ 
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(149/29) Wat Hy sd sg Sa Hus, sing wary e Hiss 


kara mag par pari tera mdggora, cajra kdrao de mlttora. 

/K1 AH RH AA - M AA hG - PI AHR - Pl AHR UWn - T EY R AA - 
M AH hGG AH R AA, CHI AAn J R AAn - Kl AH RH AA AH - DEY - 
MIH TT AHR AA/ 


(150)/(94) at Hh ga savy SI? 


pera ji! phuk psraa li? 
/P1 AHR AA - J TY! PH UW K - Pl AHR AA AH - LIY?/ 


(151)/(98) fie ufenrag & Hg Hea a Heat Soni 


mido mUtlar ne mii motkas ke motka pdnnla. 
/M IHn D OW - M UH T: IH AA R- N EY - M UWnh- M AHT: K AA AH - 
K EY- M AH T: K AA - Pl AH NN JH AAn/ 


(152/114) 9 - ASH SBT ASH ASH famr| 


sorma vala sorma sormao gla. 
/SH AHR M AAn- V AAL AA- SH AHRM AA - SH AHR M AA AH - 
GIH AA/ 


46. Hd (tone): 


(153) urst Ut & dat ut 


pani pi ke cokki pi. 
/P AAN: TY - PTY - KEY -CH AH KKTY - PTYh/ 


(154) wd ut a uM oS vat ut! 


ca pi ke cao nal cokki pi. 
/CH AAh - PTY - K EY, CH AA AH - N AAL- CH AH KKITY - P [Yh/ 


(155) 3a Fa DMS 


tii vi vi rUpoie le la. 
/T UWn - VIY - V TYh-R UH PP AHTY EY -L AE-L AA/ 


(156) TI SIS, fan 4 Sst = at Sat! 


va pogvan, kIse nt tatti va ni loggi! 
/V AAh-PIAHGV AAN, KIHS EY -N UWn-T AHTTIY - V AA- 
N T¥n-LAHGGIY!/ 


(157/133) = SB™ TS Sd, uUas som! 


talao ’c na na, pokore talao. 
/T AHL AA AH -’CH - N AA - N AAh, P AH K AO RH EY - T AHL AA AH/ 
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47. H3d / matras(9:a::1:1::U:u::0:9:e: €) 


(158) aH ad a ard 'T ¥e Be 


k3mm kor ke kar ’c cute lot. 
/K AHn MM - K AHR-K EY - K AAR -’CH- CHI] UW T: EY - L AH TYn/ 


(159) fod a Bt AT dtd 


clr na la, sag cir. 
/CH TH R-N AA-LAA, S AAG- CHTY R/ 


(160) G2 yg, Hs UT! 


oe sur, sUr cc ga! 
/OW EY -S UWR, S UHR -’CH-GAA!/ 


(161) Ss VE w aH Ss a, Act GE HSI 


lire ton da k3mm chodd ke, meri t5n mol. 
/LTY RH EY - Tl OWN: - D AA - K AHn MM - CHH AH D:D: - K EY, 
MEY RIY - Tl AON: -M AHL/ 


(162) Ad ag a Ag to ue 


ser kor ke ser dUdd pio. 
/S AER-K AHR-K EY-S EY R-D UHhDD - PTY OW/ 


(163) 48. WH WHT a ast TS, TH Ire TH 
kUmm kUmaz ke kiti gall, gUmm gUac gi. 
/K1 UHn MM - KI UHM AA AH- KEY - K TY TIY -G AHLL, G UHn MM - 
G UH AA CH-GIY/ 

(164) AYE Sa Sry-ByIS ne ot cuge S| 


sdgne bag ’c bag-bogelle a i tapakde ne. 
/S AHn hG N: EY - B AA G -’CH - B AAhG -- B AH GIEY LEY - AA - IY - 
T AHP AHK DEY -NEY/ 


(165) HAS vat od UA ao a AA Cutt 


mosul cUgi ’cé pese bachao ke més cUgi. 
/M AH S UWL - CH UHn GTY - ’CH OWn - P AES EY - BAH CH AA AH - 
K EY -hM AEn S - CH UHn hGTY/ 


(166) TS ad, WS fAS1 
gol koro, kol jItto. 
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/G OW L- K AHR OW, KI OW L - J IH TT OW/ 


(167) TS TS agg, US o aI 


gall gol gabborua, kd] na ker. 
/G AH LL-G AOL-GAHhBB AHR UW AA, KI AOL -N AA- K AHR/ 


(168) 49. da Afaer, st rusts Afri 


godda sUjjla, ta hospotal sUjjla. 
/G OW D:D: AA -S UH JJ TH AA, T AAn- HAHS P AHT AAL- 
S UH hJJ IH AA/ 


(169) gee, He dS ¥e eS wT Ua Bel 


bujra, jott ne cdtt badle da cddda cakk lene! 
/B UW hJ RH AA, J AHT:T: - N EY - CHI AHT:T: - BAH DLEY -D AA- 
CHI AHn D: AA - CH AH KK -L AEN: AE!/ 


(170) «3450. dat fey dS get rar 


gdda slg ne gdda khada. 
/G AHn D: AA - S THn hG - N EY - G AHn hD: AA - KH AA hD AA/ 


(171) Het savas UT 


sUd de dabbe ’c sUd pe ge. 
/S UHn hD: - D EY - D: AH BB EY - ’CH - S UHn D: - PAE- GEY/ 


(172) odt-Ha 2 Has ot ast Ceahl 
kUdi-més de sdgal di kUdi tUttgi. 
/K UHn hD: TY -- hM AEn S - DEY -S AHn GAHL-DTIY-K UHnD:IY - 
T: UH T:T: GIY/ 


(173) uot "0 cut Yat Sal, fa det fasurs ? 


tuhi ’c tédi khiidi thoka, kI khUdi kIrpan ? 
/T1UW HIY -’CH - T EY hD: IY - KH UWn D: TY - TH: OW K AAn, 
K IH - KH UHn bD: IY - K IHR P AA N?/ 


(174) zat dat 3 Sst Uat, St foot 6 Vet Sch 
voddi kUri ne cholli ctidi, ta nlkki ne cidi vaddi. 
/V AH D:D: TY - K UH RHTY - NEY - CHH AH LLIY - CH UWn D: TY, 
TAAN-NIHKKITY - N EY- CH UWn bD: IY - V AH hD:D: TY/ 


(75) «51. GR, aqoatde a dos a 
tido, gadok de gad di g3d 6 bac. 


/TITY DOW, G AHn hD AH K-DEY-GAHnD-DIY- 
G AHn hD - T OWn - B AH CH/ 
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(176) we add, fa Wet Bae? 


dava korné, kI tava boloné? 
/D AA V AA-K AHRN AEn, KTY - T] AA V AA - BOW L AHN: AEn?/ 


(177) wot oa te Sur age a det fAul 
ari da dada tIkkha koron da t3da sIkkh! 
/AA RTY -D AA-D AHnD AA -T IH KKH AA- K AHR AHN-D AA - 
TL AHn D AA- S IH KKH! 
(178) ve ana Ye Ot Stl 
don kas ke tdn to Ii. 
/D AON: -K AHS KEY- TIAON:- TIOW- LIYn/ 
(179) #52. wufss sit ae, eg nia SI 
péla dabbi pal, pher ogg bal. 
/P AEh L AAn - D: AH BBIY - P1 AA L, PH EY R - AH GG- B AA L/ 
(180) eet ufafent, at avet HOI 


bana pénle, ta pana mdnn. 
/B AAN: AA - P AEh NIH AE, T AAn - P] AA N: AA - M AHn NN/ 7 
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APPENDIX B 


CORPUS (35 SETS: 38Mr) 


qd) 8 Gun ad ot, GH S30 fea U atti 


dom vire di, tUmm ladon vicc pe gi. 
/UW hD AHM - VIY REY - DIY, Tl UH MM -L AHn D: AHN - 
V IH CHCH - P AE- GIY./ 


(2) Cy Ay o feast, flag Gust ISI 


Ugg sUgg na nIkkli, klddor Uddalgi bacono. 
/UHhGG -S UH hGG-N AA-NIHKKLIY, K IH hDD AHR - 
UH hDD AHLGITY - B AH CH AH N OW./ 


(3) Ge fas o ure ise, ad da At Ag t gs eg 
od6 kI6 na al6 milttora, jad6 r3g si soré de phUII varga 
/OW D OWn - K IH OWn- N AA - AA TH OWn - MIH TT AHR AA, 
J AH D OWn -R AHnG- SITY -S AHhR OWn - DEY - PH UH LL - 
V AHRG AA/ 


(4) ~~ «bp ucege dt, fg fis 2 wrae niger 
okkh potvaron di, jI6 Ill de alone ada 
/AH KKH - P AHT: V AAR AHN - DIY, J IH OWn - IH hLL - D EY - 
AA hL AHN: EY-AAn D: AA/ 


(5) wg & Soho way St flup-ush, fan as dis a act 


a le nattia kdrao li pIppal-pottia, kIse kole gall na kori 

/AAh-L AE-N AH TT IY AAn - Kl AH RH AA AH -LIYn - 

PITH PP AHL -- PAHTTITY AAn, K IHS EY -K OWLEY -G AHLL- 
N AA - K AHR JHn/ 


(6) we 8 ea fsa, gat Au a arent 
a le phor mIttora, baka mec na aia 
/AAh - L AE - PH AH RH-M IH TT AHR AA, B AAn K AAn- M EY CH - 
N AA - AATY AAn/ 

7) #«& fa 3a dat yAet, eA aS for ast fed voe 


Ikk tera rig mUski, duja da IIa goli vicc carkha 
AH KK -T EY RAA-R AHn G-MUH SH KITY, D UW J AA - D: AAh- 
LIH AA -G AHLIY - V IH CHCH - CH AH R KH AA/ 
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(8) ffa Sct fae ses, foe AS H Sag 2 Afset 
Ikk teri jld badole, méne sare mé tabbor de sédi 


JH KK - TEY RTY - I THn D- B AHD AHLEY, MEY bN: EY - 
S AAR EY - M AEn - T: AH BB AHR - DEY - S AEnh DIY/ 


(9) FA He ufews-Ad, dst fea staat 


sut potlala - sahi, pera vicc cajra 
/S UW T: - P AHT: IH L AA -- SH AA HTY, P AER AAn - V JH CHCH - 
CHI AAn JR AAn/ 


(10) As dat ste ty a, Act Seat "3 HITS Tet 


satt rdgi chit dekh ke, joatti hotti ’te ssrabon hoi 
/S AH TT -R AHn GIY - CHH TYn T: - DEY KH K EY, J AH T:T: IY - 
H AH T:T: IY -’T EY -SH AHR AAB AHN: -H OWTY/ 


(11) WY Je da adi, us ae at ag or ude 


sadu hUde rabb varge, kUd kodd ke xer na paie 
/S AA hD UW -H UHn D E- R AH BB- V AHR GEY, KI UHnD: - 
K AHhD:D: - K EY -X AER-N AA-PAATY EY/ 


(12) AUS HSS HSH, SSE St SSET II 


sUgor sUdol sUnokkhi, tolna nahi lébboni. 
/S UH hG AH RH -S UH D: AOL-S UHN AH KKHIY, T:1OWN: AA - 
N AH H JHn - L AH hBB N: IY./ 


(3) J Tod o HEee, foto ae famr 


hari na molvene, gldda har gla 
/H AARTYn-NAA-M AHL V AEN: EY, GIH hDD AA -H AAR- 
GIH AA/ 


(14) Toa Woe dad eS, taut a wet A ag 
haka marde bokkoria vale, dUdd pi ke jai je kUre 
/H AA K AAn-M AARDEY -B AH KK AHRITY AAn- V AA LEY, 
D UH hDD - PTY -K EY -J AATYn-J AE- K UHR EY/ 

(15) JA a a Sy sgh, Act AA SaH ot HST 


hoss ke na 14g veria, meri sass pdroma di mari 
/H AH SS- KEY -N AA-L AHnhG- V AERTY AA, MEYRIY - 
S AH SS - PlLAH RAHM AAn-DIY-MAARITY/ 


(16) @ focal i Sra sadh, Ja Aa a dies 6 ust 


kéri € tt’ sag tordi, hotth soc ke gddal nti pai 
/K EYh RHIY - AEn - TUWn -S AA G- T OW RHDIY, H AH TTH - 
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(17) 


(18) 


(19) 


(20) 


(21) 


(22) 


(23) 


(24) 


(25) 


S OW CH - K EY -G AHn D AHL - N UWn- P AA I[Yn/ 


aat (6H 6 USA Bae, fess Sri e 


kori nimm nit potase logde, vére choarla de 
/K AO RHIY -N IHn MM -N UWn- P AHT AAS EY -LAHGDEY, 
V EYh RH EY - CHH AH RH IH AAn - D EY/ 


ae Tea o dude wot, ys Tat stg ae 


kode hak na cddorie mari, cure vali ba k4dd ke 
/K AHDEY -H AA K-N AA-CH AHnD AHRIY EY -M AARITY, 
CH UW RH EY - V AALTY -B AAnh- K AHhD:D: - K EY/ 


YAH S uve ast, war da B He 'S uT a 

khosma nt khan kUria, kdra cakk lt mon ’te tar ke 

/KH AH S M AAn - N UWn - KH AAN: - K UH RHTY AAn - KI AH RH AA - 
CH AH KK -L UWn-M AON: -’T EY - Tl AHR - K EY/ 


ag Teal d A, fagat we 3 wet 


khore raekoti ne jito, kéri cat ’te lai 
/KH AO R EY -R AA EY K OWT: ITY -NEY - J IY T OW, K EYh RHIY - 
CH AAT: -’T EY -L AA TY/ 


dst Bact Jom BE Ts, oI Yass wT 


goddi cdrdi pdnaa lse godde, cao mUklave da 
/G AH D:D: TY - CH AH hRH DIY - Pl AHN AA AH -L AHEY - 
G OW D:D: EY, CH AA AH-M UH KL AA VEY -D AA/ 


as a a fa 2 3a, 9a NS fsa Es 


gal logg ke vir de roi, pabo meni clrak ditta 
/G AHL-LAHGG-KEY-VIYR-DEY-ROWEY, PI AAB OW - 
M AE N UWn - CHI IH RH AH K - DIH TT AA/ 


kUd kadd la patton ’te khorie, pania ni ogg logg ju 

/K1 UHn D: - K AH hD:D: -L AA - P AH TT AHN: -’T EY - 

KH AH hRHIY EY, P AA N: TY AAn - N UWn - AH GG - L AH GG- J UW/ 
we ust flmr @ ot, Agete war eddie arg 


kUtt pani pla de ni, sonié kdra pdrédie nare 
/K1 UH T:T: -P AA N: TY - PIH AA - DEY - NIY, S OWhN: EYn - 
Kl AH RH AA- Pl AH EYn DIY EY-N AAR EY/ 


we flis o Has Aret, fefemit a de ofr 
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kUddu pld na sdgere jani, tIbbla ’c pen kassia 
/K1 UH DD UW - P IHn D: - N AA - S AHn hG EY RHEY - J AA N: TY, 
T: IH BB IH AAn - ’CH - P AEN: - K AHSS IY AAn/ 


(26) we AT St et, Wet Paz St 


kar ja kUtti sdgi, kUdde réger di. 
/KI1 AHR -J AA - KI UH T:T: TY -S AHn hGTY, KI UH DD EY - 
R AHn hG AH RH - DITY./ 


(27) wgMr cast 'S, Wait ad uf yf 


kUggua tali ’te, kUggi kore ku ku 
/K1 UH GG UW AA - T: AAHLTY -’T EY, K1 UH GGTY - K AHR EY - 
Kl UWn - KI UWn/ 


(28) us de uly godt, oS AWE Assi Agel 


kUd kéddd pig cutdi, val sdgne kolola karde. 
/K1 UHn D: - K AH hD:D: - PTYn hG - CH] UW T: DIY, V AAL - 
S AHn hG N: EY - K AH LOW L AAn- K AHR DEY,/ 


(29) 8 ape Sseaeele at, aos ta’ Hise ust 


kSnn chonkatidie ni! katé dUdd ’c minana paUdi. 
/K AHn NX N: - CHH AH N: K AAUHn DIY EY - NITY, K AAh T OWn - 
D UH hDD -’CH-MTY NX AHN: AA - P AA UHn DIY./ 


30) U vq WT a3 3, Hos sac ont 


car ja bote ’te, mdnn le por di akhi 
/CH AH hRH -J AA-B OWT EY -’T EY, M AHn NN -L AE- PI AOR- 
DIY - AA KHIY/ 


(31) vo Se fos URE, WS HAS UE IoT 
cann pavé niItt corda, santi sojjona de baj hanera 
/CH AHn NN - PI AA V EYn - N IH TT - CH AH hRHD AA, S AA N UWn - 
S AH JJ AHN: AAn - D EY- B AA hJ -H AHN EY R AA/ 
(32) fee de JHE od Hse, Sat 32 Ha age 
cltte d3d hassono nahi réde, loki pére sokk karde 


/CH IH T:T: EY - D AHn D-H AH SS AH N: OW - N AHH IHn - 
R AEnh D EY, LOW KTY - Ph AERH EY - SH AH KK - K AHR DEY/ 


(33) US USE Aaa AS, Hs STH das 
call callie jarog de mele, mUda tera mé cokk Iti 
/CH AH LL - CH AHLLTY EY -J AHRAHG-DEY-MEYLEY, 
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M UHn D: AA - T EY R AA - M AEn - CH AH KK - L UWn/ 


34) 3s ea o sidat-dait, fea wear gafourt A 
cher ke parlda ragia, kItthe jaéga bubonla sada 
/CHH EY RH - K EY - Pl AHR JHn D: AAn -- R AHn GIY AAn, 
K JH TTH EY - J AA EYn G AA - B UWB AH NIH AAn- S AA hD AA/ 


(35) ac feGg ux dee, drdtt te free 


chota dlor bora tUtt-pena, hassodi de ddd gInda 
/ CHH OW T: AA - DTH OW R- B AH RH AA - T: UH T:T: -- P AEN: AA, 
H AHSS AHDIY - DEY -D AHn D- GIHN: D AA/ 


(36) stant & fest Sct, wufad ties as a 


chorla ne dIlli 1Utti, dUpére diva bal ke 
/CHH AH RH IH AAn - N EY -DIHLLIY -L UHT:T: IY, 
D UHP AEhREY -DIY V AA-B AAL- KEY/ 


(37) Sa U fads Us Hey, ads duct ded resi 


chore da kéra pUtt morju, katé kaddodi cddorie gala 
/CHH AH RH EY -D AA-K EYh RH AA-PU TT-M AHRJ UW, 
K AAh T OWn- K AH hD:D: AH DIY- 


38) F Hot 4S St Hs ot Vodt, 3g UATSE ct 
jUtti kholl di morora nohi calldi, tor pajaben di 


/J UH TT TY - KH AHLL - DIY -M AHR OW RH AA -N AH HIYn- 
CH] AH LL DIY, TOW R - P AHn J AA B AHN: - DIY/ 


(39) faAge Hz wat oS ast, wot tt Jest SSS 


jlona mor dUkhia da beli, saha di hoveli 1Uttda. 
/JTHOW N: AA-M AO RH-D UHTY AAn-DAA-BEYLIY, 
SH AAH AAn-DIY-HAHVEYLIY-LUHT:T: D AA/ 


(40) nv fscud Aadwudarse 


jeto da kIla topa di, je kaddi ma di gal ve 
/J AET OW -D AA-KITHLAA-T: AHP AA- D UWn, J EY - 
K AH hD:D: TY -M AAn- DIY -GAAL- V EY/ 


(41) Act acaud tt, 3 yHE wesH wT 


jotti kotkopure di, te bamon Sborser da 
/J AH T:T: TY -K OW T: K AH PUWR EY - DIY, T EY - hB AA M AHN: - 
AHn B AHR S AHR -D AA/ 
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(42) 8 Be 8 (88, ata Uy ow Jes 


cut le bIllo, ni mUdda pig da hUlara 
/CH1 UW T: - L AE - B IH LL OW, N TY - M UHn D:D: AA - P TYn hG - 
D AA -H UHL AAR AA/ 


(43) BHA Fe Se, Udt yodt U1 


cUmke cute léde, pokkhi caldi de. 
/CHI UH M K EY - CHI UWT: EY -L AEnD EY, P AH KKHIY - 
CHI AHL DIY -DEY,/ 


bn et 


(44) Bde Hd U, Fe, ae aT Ell 


cdgra cajor da, cUdduo, ked korao du. 
/CH1 AH G RH AA - CHI AAn J AHR - D AA, CHI UH D:D: UW OW, 
K AED - K AHR AA AH- D UW./ 


(45) 2 He vse ASS ads u, S HHS API 


jen corgi sorabe kortar di, che sarbale sajge. 
/J AHn NJ -CH AH hRH GIY -S AHR AA hB EY - K AHRT AAR -DIY, 
CHH EY -S AHRB AAhLEY -S AHJ GEY./ 


(46) < oot Sat stant &, os ca S vat esr 


tolli aUdi chorla di, nala tg le kUgorua vala 
/T OW LLTY - AA UHn DIY - CHH AH RH IH AAn - DIY, N AAL AA - 
T: AHn G - L AE - K] UHn G AHR UW AAn- V AAL AA/ 


(47) eeaag afs at ago, set egar Aa a att 


tUtt ke na bé ji virna, péna vorga sak na koi 
/T: UHT:T: - K EY -N AA-B AEh-JTYn- VIY RN AA, PI AEN: AAn - 
V AHRGAA-S AA K-N AA- K OW TY/ 


(48) 3 vigor Ho Hla", SS Sdd TH (ode = fU3S tas YECE TB") 


cérla jeth mohina, thath thothere di. 
/CH AH hRH IH AA - J EY TH: -M AHHTY N AA, TH: AA TH: - 
TH: AH TH: EY R EY - DIY./ 


(49) 3 3d 3d oa fAG o hd, Afsst A yoe UST 


dor dor ke jIU na hire, séti vi mUrad paugi. 
/D: AHR -D: AHR-K EY - J TH UHn- N AA-HIYREY,S AEhTIY - 
VIY-MUHRAAD-PAAUWGIY/ 


(50) stant & an afr, 28 SS MST S aI 
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dodla da kes banIa, dekho dUdde omli de kare. 
/D: OW D: IH AAn- DAA -K EY S- B AHN: IH AA, DEY KH OW - 
D: UH D:D: EY - AHMLIY - K AAR EY./ 


(51) 3at Ue & yard za, Sct Act at fegetl 


didi pItt na cUbare cédrke, teri meri nahi niboni. 
/D: Aon D: TY - PITH T:T: - N AA - CH UH B AAR EY - CH AHhRH K EY, 
TEYRIY-MEYRIY-NAHHTYn-N JIHbB AHN: IY,/ 


(52) @ wet fea S AeA OS VSet, dast HS HS ct 


tai dIn na jovani nal calldi, kUrti mal mal di 
/T:_ AA TY -DIHN-NAA-JAHV AANIY-NAAL-CHALLDIY, 
K UH RH TIY -M AHL-M AHL-DIY/ 


(53) 2S He OS, de a Ba at 


tol sorabi nal, gorie na lor ni 
/T:1 OW L-SHAHRAABIY- NAAL,GOWRIY EY -N AA- 
L AH RH-NTIY/ 


(54) fe8 Saat HTS, uret uk SST 


tilld kdddoda gere, pani pi thdda. 
/T:1 TH LL OWn - K AH hD:D: AH D AA - GEY RH EY, P AAN: ITY - 
PITY - TH: AAn hD: AA./ 


(55) 2S JMS d, SAS de Se "STI 


tol rdgile ne, ndddi6 hir bona ’ti. 
/T:1 OW L-R AHn GIY LEY - NEY, N AHhD:D: IY OWn - HIY R - 
B AHN: AA -’TITY,/ 


(56) = Adel See ad sgAeaHe, Arete Ae fori 


soni nonod kohe parjaie, jani-jan jan gla. 
/S OWHN: TY -N AHN: AHD- K AHHEY - PIAHRJ AATY EY - 
JAAN:IY --J AAN: -J AA N: - GIH AA// 


(57) 3 39 8a & fimr eras, TSM ds IS sas 
tere 15g da pla IIskara, halia ne hal dokk le 
/TEY REY-LAOnG-DAA-PIHAA-LIHSH K AAR AA, 
H AALTY AAn-NEY-H AHL-D: AH KK - LEY/ 

(58) 3d Halos a, F ot wet 33 


tor sUkinon di, ti ki jandi péde 
/TOWR-SHUHKITIYN AHN: - DIY, TUWn-KIY-J AAN: DIY - 
PI EY D: EY/ 
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(59) Sfmt & 2o de a, de dine US feu ot 


tUmml di vel vdd ke, jatt bijde khet vIcc norma 
/T UHn MM IH AAn - DIY - VEY L- V AH hD:D: - K EY, J AH T:T: - 
BIY J DEY- KHEY T- V IH CHCH- N AHR M AA/ 


(60) So fllz andt &- Het, Aa, ear 
tinn pid k3jra de: mohi, jagpUr, dakha 
/T IHn NN - PIHn D: - K AHn J R AAn- DEY --M OW HIY, 
J AAn G P UHR, D AA KH AA/ 


(61) &@ as feu srt 6, fe dost de FeAl 


thal vice pUjjodi ni, mI] pUnnona robb bonke. 
/TH AH L- V IH CHCH - Pl UH JJ DIY - N UWn, M IH L- 
P UHn NN AHN: AA - R AH BB -B AHN: K EY./ 


(62) a ot dug ue, dye Fas el 
tha tha thopper péde, thSmmon thothle de. 


/TH AAn - TH AAn - TH AH PP AH RH - P AEnD EY, 
TH AHn hMM AHN: - TH AH TH LEY - D EY// 


(63) = feGg ere a Ue, Het Ta S 


dlor vason na deve, soni pabo nt 
/DIHOWR-V AHS AHN:-N AA-DEY VEY, S OWHN: ITY - 
P] AA B OW - N UWn/ 


(64) cdHS tI A, Pe Hae I 


dorshon dé dipo, de de s5k de gere. 
/D AHRSH AHN -DEYh- DIY POW - DEY - DEY - SH AOn K - DEY - 
G WY RHEY.,/ 


(65) OS ad tug a, Sa uses Bes Ut aH 


ton kUr doder di, lokk potla badon di pari 
/TlAHN-KUHR-DAOhD AHR- DIY, LAH KK-P AHTLAA- 
BAHD AHN-DIY-PIAARTY/ 


(66) Ge ead us ad 33, aut fea ae FT TAH 


tade vddge tan kUre tere, kdda vicc ked ho gai. 
/Tl AHn D EY - V AHhD GEY - TIAHN-KUHREY-TEYREY, 
K AHn hD AAn - V IH CHCH - K AED - HOW -G AHTY.,/ 


(67) OHa ue Ades, Wey USE CH 


tamok pove sorkare, onokh pajabon di. 


164 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


/T1 AH M AH K -P AH V EY -S AHK AAREY, AHN: AH KH - 
P AHn J AA B AHN: - DIY,/ 


(68) 8 at ufenrs sar ga, ast dg sae" 


ni potlale vala raja, bolli hor bolda 
/N TY -P AHT: IH AALEY -V AALAA-RAAJ AA, BOWLLIY - 
HOW R-BOWLD AA/ 


(69) os Ue de 838, 39 ubed ants eS 


Nabe die bad botole, tenti pinge nosiba vale 
/N AA hB EY - DIY EY - B AHnD-B OWT AHLEY, T AEN UWn- 
PTY N: GEY -N AHSTY B AAn- V AALEY/ 


(70) at He 3d aH S Ha, aS SS we fomr user 


ni mae tere ksmm na mUkke, kdthe vala a gla poraUna 
INTY -M AA EY - T EY REY - K AHn MM -N AA-M UH KKEY, 
K AHn TH: EY - V AA L AA - AA - GIH AA - P AHR AAh UH N: AA/ 


(71) ou Ust sfaot a, stag ur ur vadt 


pori chorla di, cajor pa pa cardi 
/P AO RHTY - CHH AH RH JH AAn - DIY, CH] AAnJ AHR-P AA - 
P AA - CH AH hRH DIY/ 


(72) ura SU AGH, So aH at AUS sd 


pa ke lopp sUrma, tera ksmm ki sad de dere 
/P AA- KEY -L AH PP- SUHRM AA, T EY R AA - K AHn MM - KIY - 
S AAhD-DEY-DEYREY/ 


(73) uret Hat Eo fed, Aa fae est sare 
pani mage dUdd didia, jagg jlin voddia parjaia 
/P AAN: IY -M AHnG EY - D UH hDD- D IHn DIY AAn, J AH GG- 
J IH UWnN: - V AH D:D: IY AAN - PL AHR J AA TY AAn/ 


(74) + 8 ad chee usr, frat se t fess aug 
phUIl vagi rén khIria, jIna pabia de dlor kUare 
/PH UH LL - V AAn G UWn - R AEh N: - KH JH RH TY AAn, J TH hN AA - 
Pl AA BIY AAn - DEY - DIHOWR - K UH AA REY/ 


(75) ant Toa, 2a Jd Wet, ea Sot SS TSH 
phasi hargi, phoj her jani, phe teri vari ranié. 


/PH AAnS TY -H AARGITY, PHAOJ-HAHR-J AAN: TY, PHEY R- 
TEYRIY-V AARTY -RAAN: TY EYn/ 
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(76) & uy yal 6 waht We, Sess "CHE 'S 
bUra bUri nti karisi jave, |Uddehane ’teson ’te 
/B UH hRH AA - B UH hRHIY - N UWn- KI AH RHTY SITY -J AA VEY, 
LUH DD EY HAAN: EY -’T: ESH AHN: -’T EY/ 


(77) ay one &, wea Het a det 
baj nasiba de, kUttke jopphi na pédi 
/B AA hJ -N AH STY B AAn - DEY, KI UH T:T: K EY - J AH PPHIY - 
N AA -P AEn DIY/ 


(78) ay onte t, ge vor t dell 
baj nosiba de, bUjan candi de dive. 
/B AAhJ -N AH STY B AAn - DEY, B UHhJ AH N: - CH] AHN AAn - 
DEY -DIY VEY./ 


(79) ade edreacmr, AS USA aS} At ASH 


bacona phoridkotia, jine pUlos kUtti si sari 
/B AH CH AHN AA - PH AHRTY DK OWT: TY AA, J TY hN EY - 
PUHL AHS-K UHT:T: IY -STY-S AARIY/ 


(80) 3g 3d 3d Ja HSM, Td Jad Ae’ ot Sfser 
per par vid mUthia, gore rdg ne soda ni réna 
/P1 AHR - P! AHR - V AHn D: - M UH TH: IYn, GOW R EY - R AHnG - 
NEY -S AHD AA -NIY -R AEhN: AA/ 


(81) fad dash sur 3 ast, fad fisdiay seb 
kIvé cabolia pua te patijia, kIvé mIlgéba bonia. 
/K TH V EYn - CH AAn hB AH LTY AAn - PI UW AA - T EY - 
Pl AA TITY JTY AAn, K TH V EYn - M IHL GOW bB AA - 


B AHN: TY AAn,/ 


(82) ret Sse, AS Se SITS 
plijgi pagpari, sab loi parjaie. 
/PIIH JI GIY - PlAAG PI AARIY, S AAn hB - L AHTYn- 
PIAHRJ AATY EY// 

(83) UH, SIS, ASS, SH IA vzell 


udom, pdagat, sorabe, phasi hass carde. 
/UW hD AH M, Pl AH G AH T, S AHR AA hB EY, PH AAnS IY -H AHSS - 
CH AH hRHD EY./ 
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(84) sds-sHt 6, Star sas, ATS" 


parot-pumi ni, lobla paget, soraba. 
/P1 AA R AH T -- P!UW MIY - N UWn, L AH hB IY AA - Pl AHG AHT, 
S AHR AA bB AA/ 


(85) oH AS HAH U, US VSS SEe fenr Aa 


mele mUksor de, coll callie nanad dla vira 
/M EY LEY -M UH KS AHR- DEY, CH AH LL - CH AHLLIY EY - 
N AHN: AHD -DITY AA- VIY R AA/ 


(86) Hed fsa a, ds yas dale 


motor mlIttara di call, bornale callie 
/M OW T: AH R- MJH TT AHR AAn - DIY, CH AH LL - 
B AHRN AALEY - CH AHLLITY EY/ 


(87) Ys od ct fag & AWS, fers a 8 femr 33 & fect 


mUda rohi di kIkkor da jatu, vlad ke le gla tut di chIti 
/M UHn D: AA-ROW HIY - DIY -K IH KK AHR -DAA-J AAT UW, 
V IH AAh- KEY -L AE- GIH AA - T UWT - DIY - CHH IH T: TY/ 


(88) Hee afenr a, Ahad ae “st Usd ASH 


modon kdkIa da, jine kUtt ’ti pddori sari 
/M OW D AH N - K AOn K JH AAn - D AA, J TY HN EY - K UH TT: -’?T TY - 
P AHn D: OWRIY -S AARTY/ 


(89) HEM Std ST, Sta Tue" TaTHt SST 


mUnasi dag6 da, dag rakhda gddassi vali 
/M UH N: AH SHIY - D: AAn G OWn - D AA, D: AAnG - R AH KHD AA - 
G AHn D: AA SSTY - V AALTY/ 


(90) wd dedi fsad rd, fee $ fears Fale 


yar honge mlIlnge ape, dIl nti tIkane rakhie 
IY AAR-HOWN: GEY - MIHLNGEY - AAPEY, DIHL-N UWn- 
T JH K AA N: EY- R AH KHIY EY/ 


(91) Wd Wed Ga "3 Uza, Has S US Ja" 


yar jange yokke ’te corke, yabbala nti pou tUrna. 
/Y AAR-J AAN: GEY - Y AH KK EY - ’T EY - CH AH hRH K EY, 
Y AH hBB AH L AAn - N UWn- P AH UW - T UHRN AA// 
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92) JF dHetass, Hae ast asee 


rag de kale nt, mog]I6 kali karaie 
/R AHn G - DEY - K AALEY - N UWn, M OW GIH OWn - K AHLIY - 
K AHR AATY EY/ 


(93) oe Hat dD fame, dat Hest ust 


raja jogi ho gla, kanni mUdora paia 
/R AAn hJ AA - J OW GIY - H OW - GIH AA, K AHn NN IYn - 
M UHn D AH R AAn - P AATY AAn/ 


(94) Te 3a ds ea, at FH Sd od Sa ner 


rano teri gall varga, ni mé beria ’c6 ber Hada 
/R AA N: OW - TEY RIY -G AHhLL - V AHRG AA, NITY - M AEn - 
B EY RIY AAn - CH OWn -B EY R - LJH AAn D AA/ 


(95) @&@ Bee As vad, da fest ct aget 


1Uddehane ko] dUggori, ris dIlli di kordi 
/L UH DD EY H AAN: EY -K OWL-DUHGGAHRIY,RIY S- 
DIHLLIY -DIY-KAHRDIY/ 


(96) Bel We WY age tas, HS A fiz tHe 


loddi jade ¢ karob de tade, ras le ge pld de mUde 
/L AH DDIY -J AAn DIY - AE- K AHRH AH B- DEY -T: AAnD: EY, 
R AHS -LAE-GEY -PJHnD: - DEY -M UHnD EY/ 


(97) Sa sot use’ fas, ag Ase So AST 


lokk tera potla jIha, par sén na joga 
/L AH KK -TEY R AA-PAHTLAA-JIHHAA, PI AAR -S AEhN: - 
N AA -J OW G AA/ 


(98) Bs ead ands foas', ufsst oat wd ge A 
loddu vdddi kacéri6 nikkola, péli pesi yar chUt je 
/L AH D:D: UH - V AHn D: DIY - K AH CH AEh R TY OWn - 
N JH KK LL AAn, P AEhLIY - PEY SHTY - Y AA R - CHH UH T:T: - J EY/ 


(99#139) FI Tew at > fs A, fest 6S Hg eg All 


vor han da kUri nti mI] je, tIbbi Utte mi vor je. 
/V AHR-HAAN:-D AA-K UHRHITY -N UWn- MIHL-J EY, 
T IH BB IY - UH TT EY - M IHnh- V AHhR -J EY/ 
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(100) fea FS 2 Bat Guat gH 


vicc jagrava de, lagdi rosni pari 
/V IH CHCH - J AHGR AA V AAn- DEY, LAH GDIY -R OW SHNITY - 
PIAARIY/ 


(101) ZHe Adlat &, oS Ut Hage TS 


vason sorika da, nabe di sardari 
/V AHS AHN: -SH AHRTY K AAn -D AA, N AA bB EY - DIY - 
SAHRD AARITY/ 


(102) Z ust Dzait HAA VS Ga aA, AA AH YT STII 
pori corgi moroak nal lor ke, kar kar buha ponngi. 
/P AO RHTY - CH AH hRH GTY - M AH RH AHK -N AAL - 
L AH RH - K EY, K AA RH- K AA RH - B UWH AA - PI AHn NN GITY,/ 


(103) Yao Usa Jo aale! a uzamy ot dar 
khorka dorka ho ju kurie! na khorkao ni kUdda. 


/KH AH RH K AA - D AH RH K AA - HOW - J UW - K UHRHIY EY! 
N AA - KH AH RH K AA AH - NITY - K UHn D:D: AA./ 


169 


Sri Satguru J agjit Singh Ji eLibrary NamdhariElibrary@ gmail.com 


APPENDIX C 
LEVINSON-DURBIN RECURSIVE ALGORITHM 


A brief description of the Levinson-Robinson algorithm is given below. If a Toeplitz 
matrix is involved, then the computational work required to solve a set of simultaneous 
equations involving Toeplitz matrix can be reduced by taking advantage of the special 
properties of such a matrix (see Section 5.2.1.1 for these properties). The efficient 
solution is provided by Levinson-Robinson or Levinson-Durbin recursive algorithm 
proposed by N. Levinson [71], reformulated for computer programming by E.A. 
Robinson [91], and later improved by Durbin [15, 24, 74-76, 79, 82-83, 87-89, 99]. 


A p * p Toeplitz matrix is of the form: 








PRO) RU) RQ) R(p-1) | 
RM) RO) RA) R(p - 2) 
RQ Rd RO) sees R(p-3 
=n ee 
LR(P-1) R@-2) R@-3) - RO) | 


Clearly it does not involve p? distinct elements but only p distinct elements R(j); j = 0, 


1, 2, ..., p-1. Suppose we want to solve the following set of simultaneous equations 
(called Normal equations): 


a,R(O)+a,R() +.....+ a,,~R(p- H=b, 
ayR() +.a,R(O)+.....+4, ;R(p — 2) =, 


where a; (0 < j < p—l) are the only unknown quantities (in our case linear prediction 


coefficients) or some other similar coefficients). It implies that in matrix form, we have to 
solve: 


[R] [A] = [B] 


where: 
[R] = p X* p Toeplitz matrix 
[A] = Column vector [a),a,,a,,°° St4\ 
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[B] = Column vector A a 
Starting from the following initial conditions (at step n = /): 
Son = 15 &y = RO); By = RW3. Ao) = 4, / RO); 7% = Ayo); we proceed recursively from 
steps n = 2 to p. The final values obtained CGi,30) a. o1e*e3es. Pp —1) represent the 


desired p coefficients a,(O< j < p—l). 
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APPENDIX D 
SIFT ALGORITHM 


The SIFT (Simplified Inverse Filter Tracking) algorithm for the voiced/unvoiced (V/UV) 
decision and the pitch extraction was designed by Markel [75] in 1972. Its description 
can be found in almost every speech processing book [e.g., 15, 75-76, 82, 83, 87-89, 99]. 
The SIFT algorithm is based on Eq. (5.5) and Eq. (5.15) given as: 


(D.1) 
and 


E(Z) = A(Z)- S(Z)- 
(D.2) 


In the statement form, it implies that to the extent that s, is the output of a system 
well represented by an all pole model, e, is a good approximation to the excitation 
function to the same extent and that if s, is inverse filtered through A(Z), the output will 
be the prediction error or residual error e,, expected to be large at the beginning of each 
pitch period for voiced sounds and noise-like for unvoiced sounds. Assuming that speech 
{s,} 1s sampled at the sampling frequency f; = 10 kHz (appropriate adjustments can be 
easily made for different f;), and that the pitch period lies in the range 2.5 — 15.5 ms, the 
SIFT can be described in the following steps: 


(i) The speech signal {s,,} is lowpass filtered through a third order elliptic filter [75- 
76] with cut off frequency close to 1 kHz and the effective sampling frequency is 
reduced to 2 kHz by decimation (dropping 4 out of every 5 samples) to reduce 
further computations. 


(i1) The above output is then pre-emphasized by passing through a single-zero filter 1- 
z' to preserve the spectral characteristics of only the vocal tract [12, 74-76] and 
multiplied by a Hamming window: 


Uy =We(S spss Sseayaa)s OSKS ors —l) 


(D.3) 
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Where {s,} is nonzero only for 0 <n < N-1, N is equal to 400 samples and the 
Hamming window is 


w, =0.54—0.46 cos [27k I=-Dk O<k< S-) 


=0; otherwise 
(D.4) 


(iii) {ux} is analyzed by the autocorrelation method (sec. 3.1) to design a fourth-order 
inverse filter (as p = 4 is sufficient) to model the signal in (0-1 kHz) frequency 
range. {u,} is then inverse filtered to give {d,} which obviously is the residual 
error for the fourth order linear predictor. {d,} will have an approximately flat 
spectrum [74-76, 87-89, 99]. 


(iv) The autocorrelation of {d;,} is calculated and the largest autocorrelation peak in 
the desired pitch range (2.5 — 15.5 ms) is obtained. Variable threshold is used and 
if a peak crosses the variable threshold, its location is taken as the pitch period. 
Information on the previous two frames is retained for error detection. The 
autocorrelation sequence is interpolated parabolically in the region of the 
maximum value for obtaining the additional resolution in the pitch value. A frame 
is declared to be unvoiced if the autocorrelation peaks are small and fall below the 
variable threshold values. 


(v) If the error detection process finds an unvoiced frame surrounded by voiced 
frames, it is declared to be voiced with pitch period equal to the average of the 
pitch periods of the two surrounding voiced frames because an isolated unvoiced 
frame such as this is impossible to exist. 


(vi) |The input sequence {s,} is 400 samples (40 ms) and there is a 2 to | overlap of 
input data implying that 40 ms sequences are processed in 20 ms increments. 


SIFT has influenced almost every pitch detection algorithm, designed after the year 1972, 
in particular, the Robust Algorithm for Pitch Tracking (RAPT), that has also been used in 
this work. 
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APPENDIX E 
SPEECH SENTENCES SYNTHESIZED 


1. VOWELS (5 sentences): @ 4 & fad / td, HI/ AA (# 1, 2, 3, 24, 25) 


C1. Cts! Cue fag i? Cast ez! 


oe Ullu! tigda kI6 @? Uggli phor! 
/OW EY - UH LL UW! UWn hG D AA - K IH OWn - AEn? 
UHn GG LTY - PH AH RH!/ 


wm 2. WH! wag, nig Noa niga S up| 


omor! & kor, 5 enok ddor le a. 
/AA M AHR! AEn- K AHR, AOh - AEN AH K- AHn D AHR-LAE- AA/ 


@ 3. wage fees, feada flict 8a fourG| 


isor ote Idor, Ikvaja Ittd ethe Hao. 
/TY SH AHR - AH T EY - IHn D AHR, IH K V AHn J AA - JH T:T: AAn - 
EY TH EY - LIH AA OW/ 


24. fas aS, Aa dt 


clr na la, sag cir. 
/CH TH R- N AA- LAA, S AAG - CHITY R/ 


25. Gt ad, Hd’ U TT! 
oe sur, sUr c ga! 
/OW EY -S UWR, S UHR-’CH - G AA!/ 
2. NASALS (5 sentences): 3 € & 3 H(# 30, 34, 35, 11, 12) 
Si. BA Sat da 3S ea ST 
nimmo di nti de nokk ’te n3 tke logge. 


/N THh MM OW - DIY - N UWnh - DEY - N AH KK -’T EY - N AOn - 
T AHn K EY - L AH GG EY/ 
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H12. Hwa SH, Ha HT, HAT HAG, AS Te 


mdggor di ma, mama mami, massi masor, mele goe. 
/M AH hGG AHR - D TY- M AAn, MAA M AA-M AAMIY, 
M AA SS TY -M AAS AH RH, MEY LEY - G AH EY/ 


5 30. ane seaSele at! was ta ’v Hise wt 


kdnon chonkaUdie! dUdd ’c minona na pa. 
/K AHn NX AH N: - CHH AH N: K AA UHn DIY EY! D UH hDD - ’CH - 
M TYn NX AHN: AA- N AA - P AA/ 


234. Ae val Ass aga Uh, S Hoss AAMT 


jan corgi sorabe kortar di, che sarbale sajge. 
/J AHn NJ -CH AH hRH GIY -S AHR AA hB EY - K AHRT AAR- DIY, 
CHH EY -S AHRB AA hLEY -S AHJ GEY// 


= 35. Adel eu ad SIA ae, AWel-We We fomr| 


soni nonod kohe parjaie, jani-jan jan gla. 
/S OWh N: TY -N AHN: AHD- K AHH EY - PI AHR J AATY EY - 
JAAN: IY -- J AAN:-J AA N: - GIH AA// 


3. TONEMES (6 sentences): 4X2 ¥ @ 7 F (# 6, 9, 22, 33, 36, 38) 


wy 6. Sfunrs Ad yfmrg & wat or fame) 


boglar mége kUmlar da kora kha gla. 
/B AH GI TH AA RH - MEY hGEY - Kl UH MIH AAR -D AA— 
Kl OW RH A - KH AA - GIH AA/ 


Zo. Ute det o ee afew, feadt ce aad! 
tagge de cfidia na vodd kUddla, tibori tet kor di! 
/T:1 AH GG EY - DEY - CH UWn hD: TY AAn - N AA - V AH hD:D: - 
K UHn hD: JH AA, T:1 IHn B AHRTY - T: AE T: - K AHR - D: UWn!/ 


22. Wa HY sd Bg ST Hu, sia wan ct iss 


kara mag par parti tera mdggara, cajra kdrao de mlttora. 
/K] AH RH AA - M AA hG - Pl AHR - Pl AH R UWn - T EY R AA - 
M AH hGG AHR AA, CHI AAn J R AAn - KI AH RH AA AH - DEY - 
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M IH TT AH R AA/ 


B33. avon ed, yve vate 


baj nosiba de, bUjon cana de dive. 
/B AA hJ - N AHS TY B AAn - DEY, B UH bhJ AHN: - CH] AHN AAn - 
DEY -DIY VEY./ 


036. Ge ead ue ad 3d, aot fea aes Tet 


tade vodge ton kUre tere, kSda vicc ked ho goi. 
/T1 AHn D EY - VAHhDGEY - T1AHN-K UHREY -TEY REY, 
K AHn hD AAn - V IH CHCH - K AED - HOW -G AHITY.,/ 


338. SIS-HHt S, Sfart sas, ATS" II 


parot-pumi nu, lobla paget, soraba. 
/P1 AA R AH T -- PLUW MITY - N UWn, L AH hBIY AA - PlLAH G AHT, 
S AH R AA bB AA// 


4. SPECIAL SENTENCES (9 sentences): (#4, 7, 8, 10, 18, 19, 39, 41, 83) 


Ju. TAH SIA Ss sMags de ff 


hakom ne has nui horas ke hora hira jlttIa. 
/H AA K AHM-NEY -H AHnS - N UWn- H AHR AA AH - K EY - 
H AHR AA-HITIY RAA-JIHTT IH AA/ 


és 7. fetd sug dest ect eds al 


chide ne chappor ’c6 cheti cheti che kocchu phore. 
/CHH IHn D EY - N EY - CHH AH PP AH RH - °CH OWn - CHH EY TITY - 
CHH EY TITY - CHH EY - K AH CHCHH UW - PH AH RH EY/ 


o 8. Od-OU SS, Sada Saas SH 


thi-tha chodd, thik ho ke thUkk nal ré. 
/TH: UWh -- TH: AAh - CHH AH DD, TH: TY K -H OW - K EY - 
TH: UH KK -N AA L- R AEh/ 
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310. 3dAH, J 3302 He aga Sas fAO ad! 


torsemo, tii totkore bad korke tokkala sidda kor. 
/T AHRS EY M OW, T UWn - T OW T K AH RH EY - B AHnD - 
K AHR K EY - T AH KK AHL AA - S IH hDD AA - K AH R/ 


518. HS HS oe, HS HS uct 


mol mol nddi, mal mal patidi. 
/M AH L-M AHL-hN AOn DIY, MAHL-M AHL-PAA UHn DIY/ 


19. ZITew gat 6 fhe A, fest 63 Hg egal 


vor han da kUri nt mil je, tIbbi Utte mi vor je. 
V AHR-HAAN:-DAA-K UHRHIY -N UWn- MJHL-J EY, 
/T TH BBIY - UH TT EY - M IHnh - V AHhR - J EY/ 


39. Wd Wed ua 3 vga, Uss } UG Ja"ll 


yar jange yokke ’te carke, ydabbala nti pou tUrna. 
IY AAR-J AAN: GEY - Y AH KK EY -’T EY - CH AH hRH K EY, 
Y AH hBB AH L AAn - N UWn- PAH UW - T UHR N AA// 


Zui. Ust dz Had OS Saad, WH aS YS Sool 
pori corgi moroak nal lor ke, kar kar buha panngi. 


/P AO RH TY - CH AHhRH GIY - M AH RH AH K -N AA L- 
LAH RH - KEY, K AA RH - K AA RH- B UWH AA- PI AHn NN GTY// 


¢ 83. GHE gag ed of dg YT ¥! 


phUmon phUkra pher ki phonnti khé du. 
/PH UHn M AHN: - PH UH K R AA- PHEY R- K TY - PH AH NN UH - 
KH OWh - D UW/ 
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